• Open

    [D] Help with Understanding Diffusion Models: A Unified Perspective.
    I'm trying to read through the paper Understanding Diffusion Models: A Unified Perspective and came across this section: https://preview.redd.it/ykkctwhmhwob1.png?width=2346&format=png&auto=webp&s=c7595aae52a8ee22813c840a40a6d29dcf773a10 I think I kind of get what is going on here but not clearly. For one, what exactly is a Monte Carlo estimate? I tried looking online but didn't get many good results. I'm having trouble understanding why https://preview.redd.it/yazmfzg1iwob1.png?width=380&format=png&auto=webp&s=dbbbf80e85a95cd96d8e1ede73e9f8ba1e6e9096 is approximately equal to: ​ https://preview.redd.it/lbw36em7iwob1.png?width=464&format=png&auto=webp&s=46fb3ebcd02fb4b772b1be51cd59d60d3a1cf438 where z is sampled from q. Secondly, what exactly does L that z is indexed by refer to? The number of samples X or what exactly? submitted by /u/lumijekpr [link] [comments]  ( 9 min )
    [R] Shattering all 2-input binary functions
    I'm looking for the simplest model that can fit all 16 (222) possible 2-input binary functions I used the term "shatter" from VC dimension, which does not give a constructive approach to building the model submitted by /u/hnsmn [link] [comments]  ( 9 min )
    [P] Is 20s per Step on an EfficientNet-B4 CNN normal?
    I'm getting 20 seconds per step while training a 244x244x3 EfficientNet-B4 model. The batch size is 20, with 8 classes. Since I have about 5000 images, that makes each epoch around an hour and a half. Looking at models online, it seems like people get step durations in the milliseconds. Is it a problem on my end? Running on Google Colab free version. submitted by /u/hnknerd [link] [comments]  ( 9 min )
    [D][P] How to get the 3D pose estimations from an Image or Video?
    Hi, I'am trying to get the 3D keypoints coordinates from an image or video and then map it to SMPL model. It's pretty easy to get the keypoints from an image or video using the mediapipe library. But the mapping of it with the SMPL model is something that I can't figure out. mainly because the skeleton structure is different. Some already had a similar issue but the answers were not clear and he didn't even ask futher. Is it possible to do this?? if it's not possible with mediapipe is there some other library that I could use?? I heard about openPose too but when I tried it didn't work someone was saying it works only on windows 11. There are some other parts also to this project which will mostly be dealt with Pytorch. There are some pose estimators in TensorFlow but I want to stick to pytorch hence would like some pose estimators in that framwork, or a library or somehing inside opencv https://preview.redd.it/w9mioiyxhvob1.png?width=951&format=png&auto=webp&s=3886c356513b62efbcaddaa76841457cf3eb22e5 https://preview.redd.it/xjal9kyxhvob1.png?width=506&format=png&auto=webp&s=7f544a3050fbd744d300d2bf6e1a286a4014ece5 submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [R] EarthPT: how to superscale LLMs with large observation models
    submitted by /u/Smith4242 [link] [comments]  ( 9 min )
    [Discussion] [Research] How to Add Furniture to an Empty Room Using ControlNet-MLSD, so the model learns to keep the exact room pixels?
    Hello all, I've come across a fascinating example of virtual staging and I'm looking for some technical advice. Here's the image: https://preview.redd.it/3vbw441eeuob1.png?width=2511&format=png&auto=webp&s=679bc62f0cb61d479fe6dc6ce93af4f8846b8cea I get how ControlNet-MLSD is used to generate the lines and structure of the empty room. My question is, how is the furniture generated and added to the room without messing up the pixels, making it look as realistic as in the example? submitted by /u/dexter-dot [link] [comments]  ( 9 min )
    [D] Pinecone vs PgVector vs Any other alternative vector database
    Hi Everyone, Which vector database would be efficient and affordable for a enterprise chatbot? I tried Pinecone, its was simple to integrate with my python backend. But it's not open-source and its pricing it bit concerning. So Please suggest an alternative. submitted by /u/Free_Conversation106 [link] [comments]  ( 9 min )
    [D] Am i thinking backpropagation right?
    Basically i wanted to understand how backprop is done in neural networks and how i should be implementing it, so i did what i always do - the math. I just want to know if what i though up is even usable in practice or not. Here is my math. submitted by /u/EnderPoint07 [link] [comments]  ( 9 min )
    [Discussion] Question on the paper named, SELF-ATTENTION DOES NOT NEED O(n 2 ) MEMORY from Google.
    Hi, all. ​ I just read the paper named " SELF-ATTENTION DOES NOT NEED O(n 2 ) MEMORY" from Google. I understood that it requires O(1) for a single query, but still cannot understand why it requires O(log N) for self-attention and different order input. ​ It seems like adding one index into a sequence requires O(log N) (The paper's saying this). But why does it take O(log N)? Isn't it just O(1)? Because it is just adding a single datapoint for the index. ​ I really hope someone understands why it is and leaves any comment on this. Here's the paper. https://arxiv.org/abs/2112.05682 ​ Thanks in advance. submitted by /u/Maximum_Performance_ [link] [comments]  ( 9 min )
    [D] Convert ONNX model to WASM format
    I need some help regarding the process of converting ONNX model to WASM format I created ELECTRA discriminator model with my own config, then convert the Pytorch model to ONNX format. After that, I quantized the model to 2mb. The model will be used for text classification. Now I want to convert it to WASM, but I'm literally stucked and dont know how to proceed I need some suggestions on how to proceed Please help, thank you submitted by /u/Ellzaf [link] [comments]  ( 9 min )
    [D] what architecture to use with correlated data samples?
    LIke the title says, i have correlated data samples and a covariance matrix among them. if i use a fcnn i can only consider the samples i.i.d. and the use either the MSE or THE MLE as loss function. but the data samples are not independent, so what architecture would allow me to use the full covariance matrix among the samples? transformers? submitted by /u/ilrazziatore [link] [comments]  ( 9 min )
    TMLR header coming up when trying to upload paper to arxiv [D]
    I have written and submitted a paper to TMLR and also am uploading it to arxiv. However, even after using \usepackage[preprint]{tmlr}, I'm getting "Under Submission at TMLR". Should this happen. If not, where am I going wrong? submitted by /u/filletedforeskin [link] [comments]  ( 9 min )
    [D] Can overtraining be considered a subset of the alignment problem?
    i.e the goal of learning to model the empirical distribution is misaligned with the goal of modeling the "true" distribution. I've found this framing helpful for describing regulirization heuristics to people, is this a valid way of viewing it? submitted by /u/Cartesian_Carrot [link] [comments]  ( 9 min )
    [D] Any materials on machine learning applied to prosthetics?
    I have asked this question in other subreddits but no one answered me yet.I've googled it, but maybe some kind people who actually have worked or are working in this field would share some resources. Maybe there are some books or papers that are very explanatory and directly show what problems can be solved by using ML in prosthetics, how and etc. Maybe there are introductory textbooks or must-read papers. submitted by /u/tenderwrath [link] [comments]  ( 9 min )
    [R] The Rise and Potential of Large Language Model Based Agents: A Survey - Fudan NLP Group miHoYo Inc 2023 China - Github repository includes over 100 Papers with github links!
    Paper: https://arxiv.org/abs/2309.07864 Github: https://github.com/WooooDyy/LLM-Agent-Paper-List Abstract: For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent AI agents since the mid-20th century. However, these efforts have mainly focused on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a sufficiently general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse …  ( 10 min )
    [D] Alternatives to this sub?
    Since the influx caused by LLMs, this sub has become almost useless to me. What are some alternatives where interesting papers are shared, research discussions take place, and which isn't flooded with LLMs, startups, or personal projects? submitted by /u/ParanoidTire [link] [comments]  ( 9 min )
    [D] Should I scale multiclass target variable?
    Hey all Please don't mind my English writing I have a dataset with scaled feature (scaled by StanderScaler) and multiple class target variable encoded as 0,1,2..6 Should I scale the target variable like the feature to increase the accuracy (current accuracy is 79%) and if so how can I do this submitted by /u/Sunday_A [link] [comments]  ( 9 min )
    [R] Factors Influencing Adoption Intention of ChatGPT
    Hello, ​ I am an information systems student currently conducting research for my undergraduate thesis on the factors that influence people's adoption intention of ChatGPT, as well as identifying the factors that may be holding them back. These factors include people's concerns about potential negative impacts of ChatGPT, such as increased unemployment and the spread of misinformation. Your participation in this study is crucial as it will provide valuable insights to help us understand how ChatGPT can be improved to meet users' needs. ​ Please note that I am not affiliated with OpenAI, no identifying information will be collected during the survey, and all responses will be kept confidential. The survey should take approximately 10 to 15 minutes to complete, and participation is voluntary. You may withdraw from the survey at any time, and there are no known risks associated with participating. ​ If you are interested in learning more about the study, please follow the link below. ​ https://docs.google.com/forms/d/e/1FAIpQLSf5HIfXHppMuTR63x00i4OuRAtM5Ti6EGybd-HuI1kmK06VPw/viewform?usp=sf_link ​ Thank you for taking the time to contribute to our research study. Your participation is greatly appreciated! submitted by /u/maulanashi [link] [comments]  ( 9 min )
    [D] How do the APIs of LLMs determine whether they should answer a question?
    When I ask questions related to security issues through the APIs of ChatGPT, Claude and other LLMs, such as inquiring how to make a bomb, the APIs of these LLMs would often refuse to answer. How do the APIs of these LLMs determine whether they should answer a question? Do they make judgments based on pre-generated responses? Or do they match keywords in the input prompt? Or do they use a classifier to identify the input prompt? submitted by /u/ShacklesLay [link] [comments]  ( 9 min )
    [R] The Rise and Potential of Large Language Model Based Agents
    People have been chasing super-smart AI for ages, hoping they could think and act like us. While we've made a lot of cool tech, we still need a killer starting point for AI that can handle all sorts of tasks. Large Language Models" (LLMs) are like a big leap toward AI that's smart across the board. People have been using LLMs to make AI that can do loads of things. The article takes us on a trip from where AI ideas started, to why LLMs rock as the backbone for AI. https://arxiv.org/abs/2309.07864 They break down this LLM-AI into three parts: the thinky bit (brain), what they sense (perception), and what they do (action). They chat about how these AI can work solo, in teams, or buddy up with humans. https://arxiv.org/abs/2309.07864 submitted by /u/QuantumAsha [link] [comments]  ( 9 min )
  • Open

    Courses in AI Usage and Utilization for Business
    Beginning new career in a couple months, would like to upskill on AI utilization and usage cases. I won’t need to code in this role but rather understand how to use existing tools in an optimal way and recommend use cases to clients. What courses would be optimal to gain that skill set? submitted by /u/iceflamemaster [link] [comments]  ( 9 min )
    Will Cyber Security Be Replaced by AI?
    AI, including ChatGPT, has narrow expertise and lacks the broad spectrum of human intelligence. The training of AI models can be costly due to hardware, data collection, and energy consumption. The trustworthiness of training data is crucial for reliable AI models, but issues like bias, labeling errors, and data privacy can affect performance. AI systems are vulnerable to adversarial attacks, such as manipulating input data to deceive the models. AI lacks genuine understanding, emotional/social intelligence, common sense/critical thinking, and true creativity. Source : https://blog.edned.net/will-ai-replace-cyber-security/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    AI Prompt Engineers: The Six Figure Job Everyone Is Talking About
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Are You Ready For AI & Automation? Take This Free Survey and Find Out.
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    help
    what app that has custom charaters and voice chat? i forgor 💀 please 😭 it has image gen roleplay rooms etc submitted by /u/roblox22g [link] [comments]  ( 9 min )
    Google changes its stance on AI generated content
    Google is rolling out its third iteration of the Helpful Content Update, which aims to classify content as either 'written for search engines' or 'written for people'. The update reflects Google's realization that it can't accurately police AI-generated content and emphasizes the importance of creating content for people-first, regardless of the means used to create it. Detecting AI content is challenging, as AI detection tools often classify content based on tone, leading to false positives. Google's change in stance is not surprising, considering their heavy investment in AI, including chatbot Bard and new search features like the Search Generative Experience. The majority of brands now openly share articles and guides on how to use AI tools to enhance marketing strategies and create actionable content plans quickly. However, the quality and value of AI-generated content remain important factors for success, as poorly generated content can harm a brand's reputation and ranking. Source : https://stackdiary.com/google-changes-its-stance-on-ai-generated-content/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
  • Open

    Elliptic curve addition formulas
    The geometric description of addition of points P and Q on an elliptic curve involves four logical branches: If one of P or Q is the point at infinity … Else if P = Q … Else if P and Q lie on a vertical line … Else … It would seem that an algorithm […] Elliptic curve addition formulas first appeared on John D. Cook.  ( 6 min )
    Rational height functions
    Mathematicians often speak informally about the relative simplicity of rational numbers. For example, musical intervals that correspond to simple fractions have less tension than intervals that correspond to more complicated fractions. Such informal statements can be made more precise using height functions. There are a variety of height functions designed for different applications, but the […] Rational height functions first appeared on John D. Cook.  ( 6 min )
  • Open

    Adversarial Reinforcement Learning
    A curated reading list for the adversarial perspective in deep reinforcement learning. https://github.com/EzgiKorkmaz/adversarial-reinforcement-learning submitted by /u/ml_dnn [link] [comments]  ( 9 min )
    This Neural Net Maps Molecules to Aromas
    submitted by /u/keghn [link] [comments]  ( 9 min )
    Luis Lamb's full talk on Learning and Reasoning in Neurosymbolic AI (JA...
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
  • Open

    What are some of the must read papers in reinforcement learning after 2020?
    I am particularly interested in the ideas that can have high research potential and impact to the RL field. submitted by /u/C7501 [link] [comments]  ( 9 min )
    Which suboptimum is harder to get out?
    An agent is tasked to learn to navigate and collect orbs: Solution space in blue View Poll submitted by /u/FriendlyStandard5985 [link] [comments]  ( 9 min )
    Build-in strong agents in petting-zoo/melting-pot
    Hi, I would like to try test the adversarial policy (https://arxiv.org/abs/1905.10615) in petting-zoo/melting-pot environment. I wonder if there are any built-in agents besides random? Do you know any repos with Sota agents in one of those environments? submitted by /u/MrCogito_hs [link] [comments]  ( 9 min )
    Multi-GPU PPO troubles
    Hi all, I am training a small model (120k params) on a custom grid-world environment I built with JAX. I was able to train the model very well with PPO on 1 GPU, but when I scaled to multiple GPUs (tried with 6 in parallel), the training curves showed a lot more variance than what I was seeing on 1 GPU. I did not change the hyperparams, I just spawned the same number of environments (~7000 per GPU) on multiple devices. The multi-GPU setup works in the following way: - I keep parallel independent buffers, one on each device - I initialize identical models on each device - I get independent gradients on each device at the update step, then I take the mean of the gradients across the devices and then I backpropagate the same gradients on each device independently. (I checked that after some time the models are still identical, and that is the case). Now the question is, what could be the reason for such an increase in variance? What can I try to mitigate the problem? Here's a comparison of the entropy curves... P.S. The model still trains quite well, but I guess that if I manage to make the curves smoother it is going to train much faster and to a better performance. https://preview.redd.it/4m01uirjfsob1.png?width=1826&format=png&auto=webp&s=1e1a79b9f4cdefe019bb16ccb7e11fd92dd261e3 submitted by /u/arbueticos [link] [comments]  ( 9 min )

  • Open

    Simple explanation of convolutional neural network | Deep Learning Tutorial 23 (Tensorflow & Python)
    submitted by /u/keghn [link] [comments]  ( 8 min )
    Grounding DINO Explained
    Hi there, I've created a video here where I explain how the Grounding DINO model works for open-set object detection. I hope it may be of use to some of you out there. Feedback is more than welcomed! :) submitted by /u/Personal-Trainer-541 [link] [comments]  ( 9 min )
    Newsletter in Hardware Acceleration in Robotics #77
    https://news.accelerationrobotics.com/hardware-acceleration-in-robotics-77/ Hardware acceleration in robotics news. Modi wants to make India a chip-making superpower. Can he?, What's new in China's robotics market?, July chip sales edge up, but are still well behind last year, Rockwell automation acquiring AMR developer Clearpath robotics submitted by /u/pablocarrera [link] [comments]  ( 9 min )
  • Open

    How does the SB3 DQN algorithm's `predict` function work for `deterministic=False`?
    I noticed that the default argument for `deterministic` in DQN is false. But how would that work? Typically DQN is trained with a deterministic function approximator. How would the algorithm become stochastic during inference time? In DQN the final layer activation is linear and therefore I don't see how one could even make this algorithm stochastic, unlike policy gradient where the final layer is softmax or Normal. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    How does recurrent neural network implements model based RL system purely in its activation dynamics(In blackbox meta-rl setting)?
    I have read these papers "learning to reinforcement learn" and "PFC as meta RL system". The authors claim that when RNN is trained on multiple tasks from a task distribution using a model free RL algorithm, another model based RL algorithm emerges within the activation dynamics of RNN. The RNN with resulting activations acts as a standalone model based RL system on a new task(from the same task distribution) even after freezing the weights of outer loop model free algorithm of that. I couldn't understand how an RNN with only fixed activations act as RL? Can someone help? submitted by /u/C7501 [link] [comments]  ( 9 min )
    Seeking Guidance on Reinforcement Learning for Optimal Power Market Bidding Strategy
    I'm currently exploring the application of reinforcement learning to address a challenge within the power market. Specifically, I'm focused on devising an optimal strategy for electricity bidding, encompassing both buying and selling options, across different hours of the day. Imagine we have a power generator capable of producing up to 800 MW of electricity daily, with a charging rate of up to 200 MW per hour. After continuously charging it for four hours, it reaches its maximum capacity, and further charging is restricted until some electricity is discharged. Our dataset spans the past 3 years and contains vital information such as temperature, hydro availability, gas prices, and locational marginal prices, which are pivotal in determining profitability. For instance, if we decide to pu…  ( 10 min )
  • Open

    [P] Made a simple github tool to check GPU vRAM breakdown for any LLM. Supports GGML & bnb quantization
    submitted by /u/ExploreExploit400 [link] [comments]  ( 9 min )
    [P] Text to Image generation
    submitted by /u/No-Percentage7346 [link] [comments]  ( 9 min )
    [D] No Code ML Tools
    I'm taking a No code ML class and we are asked to choose which platform we want to use. The options are Dataiku, RapidMiner, and KNIME. Does anyone have thoughts on these options in terms of which is best/worst for someone with minimal coding experience? submitted by /u/V1ncentAdultman [link] [comments]  ( 9 min )
    [Research] Layer-Neighbor Sampling for Scalable Graph Network Training
    Hi everybody, I have been working on scalable GNN training for a while and noticed that the bottleneck of training GNNs is the graph sampling and feature fetching stages. GNN training frameworks PyG and DGL that most people use seem to default to using Neighbor Sampling for minibatch training. I am hoping that with my new paper Layer-Neighbor Sampling -- Defusing Neighborhood Explosion in GNNs, this default might be due to change. My new approach combines the layer sampling approach with the neighbor sampling approach. The result is that there is more overlap in the sampled neighborhoods and one still gets a fixed number of neighbors per seed vertex. It is even possible to turn it into a subgraph sampling approach by fixing the random seed used for sampling across all layers of the GNN m…  ( 10 min )
    [D] How do I move into CV/NLP?
    Hi guys need some advice, I have been working as a data scientist for the past 3 years, mostly in the domain of time series & predictive analytics (churn prediction/segmentation, etc.) with some deployment, hence do not currently have any major NLP/CV/Deep learning projects as such. Now, I can see that NLP/CV/Gen AI is mostly in demand and they are really enjoyable as well. How do I shift into these domains, given that new companies having these roles are asking for similar past working experience? submitted by /u/immortal_omen [link] [comments]  ( 9 min )
    [R] DEVA: Tracking Anything with Decoupled Video Segmentation
    submitted by /u/Mediocre-Bullfrog686 [link] [comments]  ( 9 min )
    [D] I want to improve my self in machine learning field
    I am beginner in machine learning field. I know python, some basic machine learnig algorithm like linear, logistic, decision tree, random forest. I did some work on jupyter notebok related to machine learning like data gathering, data preprocessing, data modeling, data training, finding accuracy, confusion matrix, precision, recall, feature generation etc. i deployed one machine learning model on Skitlearn as well. Now, I just want to know that what next means what i can do more in machine learning field after this. I just need some guidance to move on further. I use kaggle and tensor flow (don't know exactly more about this ). If you have any suggestion or any guidance that will be appreciated. submitted by /u/myteachexplore [link] [comments]  ( 9 min )
    [D] The fate of neural VQA and Semantic Scene Segmentation
    Today we live in a world of multi-model LLMs. How will the following technologies fare against these LLM-based models? Neural VQA Semantic Scene Segmentation Multi-model LLM are emerging quickly now, (such as NExT-GPT https://next-gpt.github.io/ ) . When you consider the kind of "understanding" of a visual scene these models are capable of, what will happen to prior approaches like Neural VQA? The nagging feeling that Neural VQA is going to be completely superseded by LLMs is palpable. The only vestige left for the older technology may have something to do with reasoning about the objects , such as properly counting the number of objects of a category that are present. But even that is getting sketchy. On the topic of scene understanding, we can turn to semantic scene segmentation. SSS is a more complicated topic than Neural VQA. SOTA SSS algorithms are still largely employing DeConv Nets, and still require fully labelled datasets. With multi-model LLMs, there is a nagging question : Why go through the complexity/mess of first segmenting a scene very accurately, when an LLM can do better at identifying the entire scene's category in one fail swoop? One might suggest that SSS still has a use in regards to interacting with the segmented objects of an environment, where one such "interaction" would be avoiding collisions with pedestrians, trees, or other cars. But honestly, SSS does not really make this connection with planning and action, it really only gives you the categories of the segments. THe autonomous vehicle's next moves are still an open problem. What technologies do you expect that multi-model LLMs will supersede, if any? submitted by /u/moschles [link] [comments]  ( 9 min )
    [R] Three Ways to Generate AI Art Using Intel Arc GPUs
    submitted by /u/reps_up [link] [comments]  ( 9 min )
    [D] Using GANs to help understand latent representations of small dataset
    Hey. First of all I'm not a researcher on this area, so pardon my ignorance. I'm looking to employ a GAN on a dataset. The goals are still a bit unclear, but it's mainly to improve classification by either data augmentation and class balancing, or understanding the data through the latent representations. I'm really keen on InfoGAN at the moment. I trained one on the MNIST and the continuous variables learned the rotation and width, as in the paper and other peoples code. But at this point I think I need some help. I have labels, which means that maybe I should use a conditional GAN. But how will I learn similar representations as those in InfoGAN? I know StyleGAN is the current big thing in this area, but my images are limited to similar pixel-range as MNIST, and StyleGAN seems dependant on the ProGAN idea of increasing resolution for training. I'm a bit confused on the whole topic and would love a pointer to any discussion etc., as I can't seem to find anything but papers and they seem to be only focused on human faces, my data is unnatural not unsimilar again to MNIST. I don't have semantical information either as I see many papers employing that. I see many papers employing semi-supervision in this area, but honestly I'm just a bit lost and overwhelmed as this is not my area and GAN papers are still not stopping (I read a post here from 2017 about a GAN making GANs...). If you read this far: thank you and any pointers and discussion are very welcome. I would post on /r/learnmachinelearning but I feel the discussion in there is very different from this. My main goal is data exploration, but also to prove effectiveness some classification will be necessary and here the generative approach may help to augment data efficiently. submitted by /u/Infamous-Bank-7739 [link] [comments]  ( 10 min )
    [D] How do I change my domain (from DS to MLE)
    Hi guys need some advice, I have been working as a data scientist for the past 3 years, mostly in the domain of time series & predictive analytics, hence do not currently have any major NLP/CV/Deep learning projects as such. Now, I can see that NLP/CV/Gen AI is mostly in demand and they are really enjoyable as well. How do I shift into these domains, given that new companies having these roles are asking for similar past working experience? ​ submitted by /u/immortal_omen [link] [comments]  ( 9 min )
    [D] Ganimede, Jupyter Whiteboard
    I have been working on a alternative to Jupyter Notebooks. Please check it out and share your thoughts : https://github.com/nottherealsanta/ganimede ​ https://preview.redd.it/k8rcx8fwrjob1.png?width=2302&format=png&auto=webp&s=a8a670251f6c268acffc88a40bd528d8d438a5f5 submitted by /u/notsorealsanta [link] [comments]  ( 9 min )
    [D] How to Evaluate Spectrograms?
    How would you evaluate generated spectrogram audio quality? Taking Riffusion for example, how would you then compare its performance to another generator? What are some common techniques that I could use? I mean of course purely in the quality of the audio itself, not my subjective opinion on how much I like the music submitted by /u/DavesEmployee [link] [comments]  ( 9 min )
  • Open

    AI Apps Product Development Canvas – Part 2
    In part 1 of this series on the updated “AI Apps Development Canvas,” I introduced the updated AI Apps Product Development Design Canvas.  The AI Apps Product Development Canva is one of the capstone deliverables for my “Thinking Like a Data Scientist” methodology, so getting feedback is critical to ensure that the methodology is relevant… Read More »AI Apps Product Development Canvas – Part 2 The post AI Apps Product Development Canvas – Part 2 appeared first on Data Science Central.  ( 25 min )
  • Open

    Bard, describe what the pandemic and insurrection have made people feel.
    The COVID-19 pandemic and the January 6th insurrection have had a profound impact on people's feelings. Many people have experienced a range of negative emotions, including: Fear and anxiety: The pandemic has brought with it a constant fear of getting sick or losing loved ones. The insurrection also raised fears about the future of democracy and the safety of the country. Grief and loss: Millions of people have lost loved ones to COVID-19. Many others have experienced loss in other ways, such as losing their jobs, businesses, or homes. The insurrection also led to loss of trust in the government and in each other. Anger and frustration: Many people are angry about the way the pandemic and the insurrection have been handled. They may be frustrated with government officials, with their …  ( 10 min )
    Bard, what percent of our thoughts just come to us on their own without our consciously choosing them?
    There is no one definitive answer to this question, as the percentage of our thoughts that come to us on their own without our consciously choosing them can vary depending on the individual and the situation. However, some estimates suggest that up to 95% of our thoughts may be involuntary. This is because our thoughts are constantly being influenced by a variety of factors, both external and internal. For example, our environment, our emotions, and our physical state can all play a role in determining what thoughts pop into our heads. In addition, our brains are constantly processing information and making predictions, even when we are not actively thinking about anything. This process can sometimes lead to spontaneous thoughts, such as when we suddenly remember something that we had forgotten or when we come up with a new idea. While some of our spontaneous thoughts may be negative or intrusive, many of them can also be helpful or creative. For example, a spontaneous thought may help us to solve a problem, come up with a new solution, or simply make us smile. Overall, it is clear that a significant portion of our thoughts come to us without our consciously choosing them. This is a natural part of the human cognitive process and can be both beneficial and detrimental. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    Are there any good ai video editing services available?
    I have a lighting business and I have LOADS of videos, time lapses, images from our projects. But I can't make decent videos with them for social media. I'd love to find an ai service where I can upload a projects media and prompt the ai with exactly what I want. Like, please use this content to create marketing videos for us on tiktok, Instagram and facebook Facebook Etc submitted by /u/RulerOfThePixel [link] [comments]  ( 9 min )
    How can I generate the missing part of the trick? Does this technique have a name?
    submitted by /u/farineziq [link] [comments]  ( 8 min )
    One-Minute Daily AI News 9/15/2023
    A little boy named Alex saw 17 different doctors over the course of three years, unable to find a root cause of his chronic pain. At her wit’s end, his mom, Courtney, fed his radiology report into ChatGPT and produced immediate answers.[1] In January, Wharton professor Christian Terwiesch gave his MBA final exam to ChatGPT. It passed with flying colors. Now, he’s at it again with a new experiment to determine whether ChatGPT can come up with product ideas better and faster than his students. It can. And cheaper, too.[2] Bathroom-cleaning robot built for commercial businesses gives consumers hope for AI maid.[3] Judge admits he used ChatGPT to write a Court of Appeal ruling as he calls the AI tool ‘jolly useful’.[4] Sources: [1] https://radiologybusiness.com/topics/artificial-intelligence/after-seeing-17-different-doctors-boy-rare-condition-receives-diagnosis-chatgpt [2] https://knowledge.wharton.upenn.edu/article/is-chatgpt-a-better-entrepreneur-than-most/ [3] https://www.foxnews.com/lifestyle/bathroom-cleaning-robot-built-commercial-businesses-gives-consumers-hope-ai-maid [4] https://www.dailymail.co.uk/news/article-12524607/Judge-ChatGPT-write-Court-Appeal-ruling-AI-useful.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Hybrid Algorithm Selection and Hyperparameter Tuning on Distributed Machine Learning Resources: A Hierarchical Agent-based Approach. (arXiv:2309.06604v2 [cs.LG] UPDATED)
    Algorithm selection and hyperparameter tuning are critical steps in both academic and applied machine learning. On the other hand, these steps are becoming ever increasingly delicate due to the extensive rise in the number, diversity, and distributedness of machine learning resources. Multi-agent systems, when applied to the design of machine learning platforms, bring about several distinctive characteristics such as scalability, flexibility, and robustness, just to name a few. This paper proposes a fully automatic and collaborative agent-based mechanism for selecting distributedly organized machine learning algorithms and simultaneously tuning their hyperparameters. Our method builds upon an existing agent-based hierarchical machine-learning platform and augments its query structure to support the aforementioned functionalities without being limited to specific learning, selection, and tuning mechanisms. We have conducted theoretical assessments, formal verification, and analytical study to demonstrate the correctness, resource utilization, and computational efficiency of our technique. According to the results, our solution is totally correct and exhibits linear time and space complexity in relation to the size of available resources. To provide concrete examples of how the proposed methodologies can effectively adapt and perform across a range of algorithmic options and datasets, we have also conducted a series of experiments using a system comprised of 24 algorithms and 9 datasets.  ( 3 min )
    Nowhere coexpanding functions. (arXiv:2303.12814v3 [stat.ML] UPDATED)
    We define a family of $C^1$ functions which we call "nowhere coexpanding functions" that is closed under composition and includes all $C^3$ functions with non-positive Schwarzian derivative. We establish results on the number and nature of the fixed points of these functions, including a generalisation of a classic result of Singer.  ( 2 min )
    Uncertainty-aware Traffic Prediction under Missing Data. (arXiv:2309.06800v2 [cs.LG] UPDATED)
    Traffic prediction is a crucial topic because of its broad scope of applications in the transportation domain. Recently, various studies have achieved promising results. However, most studies assume the prediction locations have complete or at least partial historical records and cannot be extended to non-historical recorded locations. In real-life scenarios, the deployment of sensors could be limited due to budget limitations and installation availability, which makes most current models not applicable. Though few pieces of literature tried to impute traffic states at the missing locations, these methods need the data simultaneously observed at the locations with sensors, making them not applicable to prediction tasks. Another drawback is the lack of measurement of uncertainty in prediction, making prior works unsuitable for risk-sensitive tasks or involving decision-making. To fill the gap, inspired by the previous inductive graph neural network, this work proposed an uncertainty-aware framework with the ability to 1) extend prediction to missing locations with no historical records and significantly extend spatial coverage of prediction locations while reducing deployment of sensors and 2) generate probabilistic prediction with uncertainty quantification to help the management of risk and decision making in the down-stream tasks. Through extensive experiments on real-life datasets, the result shows our method achieved promising results on prediction tasks, and the uncertainty quantification gives consistent results which highly correlated with the locations with and without historical data. We also show that our model could help support sensor deployment tasks in the transportation field to achieve higher accuracy with a limited sensor deployment budget.  ( 3 min )
    Compressed Real Numbers for AI: a case-study using a RISC-V CPU. (arXiv:2309.07158v1 [cs.LG])
    As recently demonstrated, Deep Neural Networks (DNN), usually trained using single precision IEEE 754 floating point numbers (binary32), can also work using lower precision. Therefore, 16-bit and 8-bit compressed format have attracted considerable attention. In this paper, we focused on two families of formats that have already achieved interesting results in compressing binary32 numbers in machine learning applications, without sensible degradation of the accuracy: bfloat and posit. Even if 16-bit and 8-bit bfloat/posit are routinely used for reducing the storage of the weights/biases of trained DNNs, the inference still often happens on the 32-bit FPU of the CPU (especially if GPUs are not available). In this paper we propose a way to decompress a tensor of bfloat/posits just before computations, i.e., after the compressed operands have been loaded within the vector registers of a vector capable CPU, in order to save bandwidth usage and increase cache efficiency. Finally, we show the architectural parameters and considerations under which this solution is advantageous with respect to the uncompressed one.  ( 2 min )
    eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models. (arXiv:2309.00964v2 [cs.LG] UPDATED)
    Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, weight-clustering, a form of non-linear quantization, is one of the leading candidates for LLM compression, and supported by modern smartphones. Yet, its training overhead is prohibitively significant for LLM fine-tuning. Especially, Differentiable KMeans Clustering, or DKM, has shown the state-of-the-art trade-off between compression ratio and accuracy regression, but its large memory complexity makes it nearly impossible to apply to train-time LLM compression. In this paper, we propose a memory-efficient DKM implementation, eDKM powered by novel techniques to reduce the memory footprint of DKM by orders of magnitudes. For a given tensor to be saved on CPU for the backward pass of DKM, we compressed the tensor by applying uniquification and sharding after checking if there is no duplicated tensor previously copied to CPU. Our experimental results demonstrate that \prjname can fine-tune and compress a pretrained LLaMA 7B model from 12.6 GB to 2.5 GB (3bit/weight) with the Alpaca dataset by reducing the train-time memory footprint of a decoder layer by 130$\times$, while delivering good accuracy on broader LLM benchmarks (i.e., 77.7% for PIQA, 66.1% for Winograde, and so on).  ( 3 min )
    Machine Learning-Assisted Discovery of Novel Reactor Designs. (arXiv:2308.08841v2 [cs.CE] UPDATED)
    Additive manufacturing has enabled the fabrication of advanced reactor geometries, permitting larger, more complex design spaces. Identifying promising configurations within such spaces presents a significant challenge for current approaches. Furthermore, existing parameterisations of reactor geometries are low-dimensional with expensive optimisation limiting more complex solutions. To address this challenge, we establish a machine learning-assisted approach for the design of the next-generation of chemical reactors, combining the application of high-dimensional parameterisations, computational fluid dynamics, and multi-fidelity Bayesian optimisation. We associate the development of mixing-enhancing vortical flow structures in novel coiled reactors with performance, and use our approach to identify key characteristics of optimal designs. By appealing to fluid mechanical principles, we rationalise the selection of novel design features that lead to experimental performance improvements of ~60% over conventional designs. Our results demonstrate that coupling advanced manufacturing techniques with `augmented-intelligence' approaches can lead to superior design performance and, consequently, emissions-reduction and sustainability.  ( 2 min )
    Transferable Graph Neural Fingerprint Models for Quick Response to Future Bio-Threats. (arXiv:2308.01921v2 [q-bio.BM] UPDATED)
    Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we trained graph neural fingerprint docking models for high-throughput virtual COVID-19 drug screening. The graph neural fingerprint models yield high prediction accuracy on docking scores with the mean squared error lower than $0.21$ kcal/mol for most of the docking targets, showing significant improvement over conventional circular fingerprint methods. To make the neural fingerprints transferable for unknown targets, we also propose a transferable graph neural fingerprint method trained on multiple targets. With comparable accuracy to target-specific graph neural fingerprint models, the transferable model exhibits superb training and data efficiency. We highlight that the impact of this study extends beyond COVID-19 dataset, as our approach for fast virtual ligand screening can be easily adapted and integrated into a general machine learning-accelerated pipeline to battle future bio-threats.  ( 3 min )
    Semantic Adversarial Attacks via Diffusion Models. (arXiv:2309.07398v1 [cs.CV])
    Traditional adversarial attacks concentrate on manipulating clean examples in the pixel space by adding adversarial perturbations. By contrast, semantic adversarial attacks focus on changing semantic attributes of clean examples, such as color, context, and features, which are more feasible in the real world. In this paper, we propose a framework to quickly generate a semantic adversarial attack by leveraging recent diffusion models since semantic information is included in the latent space of well-trained diffusion models. Then there are two variants of this framework: 1) the Semantic Transformation (ST) approach fine-tunes the latent space of the generated image and/or the diffusion model itself; 2) the Latent Masking (LM) approach masks the latent space with another target image and local backpropagation-based interpretation methods. Additionally, the ST approach can be applied in either white-box or black-box settings. Extensive experiments are conducted on CelebA-HQ and AFHQ datasets, and our framework demonstrates great fidelity, generalizability, and transferability compared to other baselines. Our approaches achieve approximately 100% attack success rate in multiple settings with the best FID as 36.61. Code is available at https://github.com/steven202/semantic_adv_via_dm.  ( 2 min )
    Meta-Learning Regrasping Strategies for Physical-Agnostic Objects. (arXiv:2205.11110v2 [cs.RO] UPDATED)
    Grasping inhomogeneous objects in real-world applications remains a challenging task due to the unknown physical properties such as mass distribution and coefficient of friction. In this study, we propose a meta-learning algorithm called ConDex, which incorporates Conditional Neural Processes (CNP) with DexNet-2.0 to autonomously discern the underlying physical properties of objects using depth images. ConDex efficiently acquires physical embeddings from limited trials, enabling precise grasping point estimation. Furthermore, ConDex is capable of updating the predicted grasping quality iteratively from new trials in an online fashion. To the best of our knowledge, we are the first who generate two object datasets focusing on inhomogeneous physical properties with varying mass distributions and friction coefficients. Extensive evaluations in simulation demonstrate ConDex's superior performance over DexNet-2.0 and existing meta-learning-based grasping pipelines. Furthermore, ConDex shows robust generalization to previously unseen real-world objects despite training solely in the simulation. The synthetic and real-world datasets will be published as well.  ( 2 min )
    An Adaptive Federated Relevance Framework for Spatial Temporal Graph Learning. (arXiv:2206.03420v3 [cs.LG] UPDATED)
    Spatial-temporal data contains rich information and has been widely studied in recent years due to the rapid development of relevant applications in many fields. For instance, medical institutions often use electrodes attached to different parts of a patient to analyse the electorencephal data rich with spatial and temporal features for health assessment and disease diagnosis. Existing research has mainly used deep learning techniques such as convolutional neural network (CNN) or recurrent neural network (RNN) to extract hidden spatial-temporal features. Yet, it is challenging to incorporate both inter-dependencies spatial information and dynamic temporal changes simultaneously. In reality, for a model that leverages these spatial-temporal features to fulfil complex prediction tasks, it often requires a colossal amount of training data in order to obtain satisfactory model performance. Considering the above-mentioned challenges, we propose an adaptive federated relevance framework, namely FedRel, for spatial-temporal graph learning in this paper. After transforming the raw spatial-temporal data into high quality features, the core Dynamic Inter-Intra Graph (DIIG) module in the framework is able to use these features to generate the spatial-temporal graphs capable of capturing the hidden topological and long-term temporal correlation information in these graphs. To improve the model generalization ability and performance while preserving the local data privacy, we also design a relevance-driven federated learning module in our framework to leverage diverse data distributions from different participants with attentive aggregations of their models.
    Self-optimizing Feature Generation via Categorical Hashing Representation and Hierarchical Reinforcement Crossing. (arXiv:2309.04612v2 [cs.LG] UPDATED)
    Feature generation aims to generate new and meaningful features to create a discriminative representation space.A generated feature is meaningful when the generated feature is from a feature pair with inherent feature interaction. In the real world, experienced data scientists can identify potentially useful feature-feature interactions, and generate meaningful dimensions from an exponentially large search space, in an optimal crossing form over an optimal generation path. But, machines have limited human-like abilities.We generalize such learning tasks as self-optimizing feature generation. Self-optimizing feature generation imposes several under-addressed challenges on existing systems: meaningful, robust, and efficient generation. To tackle these challenges, we propose a principled and generic representation-crossing framework to solve self-optimizing feature generation.To achieve hashing representation, we propose a three-step approach: feature discretization, feature hashing, and descriptive summarization. To achieve reinforcement crossing, we develop a hierarchical reinforcement feature crossing approach.We present extensive experimental results to demonstrate the effectiveness and efficiency of the proposed method. The code is available at https://github.com/yingwangyang/HRC_feature_cross.git.  ( 2 min )
    What Matters to Enhance Traffic Rule Compliance of Imitation Learning for Automated Driving. (arXiv:2309.07808v1 [cs.CV])
    More research attention has recently been given to end-to-end autonomous driving technologies where the entire driving pipeline is replaced with a single neural network because of its simpler structure and faster inference time. Despite this appealing approach largely reducing the components in driving pipeline, its simplicity also leads to interpretability problems and safety issues arXiv:2003.06404. The trained policy is not always compliant with the traffic rules and it is also hard to discover the reason for the misbehavior because of the lack of intermediate outputs. Meanwhile, Sensors are also critical to autonomous driving's security and feasibility to perceive the surrounding environment under complex driving scenarios. In this paper, we proposed P-CSG, a novel penalty-based imitation learning approach with cross semantics generation sensor fusion technologies to increase the overall performance of End-to-End Autonomous Driving. We conducted an assessment of our model's performance using the Town 05 Long benchmark, achieving an impressive driving score improvement of over 15%. Furthermore, we conducted robustness evaluations against adversarial attacks like FGSM and Dot attacks, revealing a substantial increase in robustness compared to baseline models.More detailed information, such as code-based resources, ablation studies and videos can be found at https://hk-zh.github.io/p-csg-plus.  ( 2 min )
    Directed Scattering for Knowledge Graph-based Cellular Signaling Analysis. (arXiv:2309.07813v1 [cs.LG])
    Directed graphs are a natural model for many phenomena, in particular scientific knowledge graphs such as molecular interaction or chemical reaction networks that define cellular signaling relationships. In these situations, source nodes typically have distinct biophysical properties from sinks. Due to their ordered and unidirectional relationships, many such networks also have hierarchical and multiscale structure. However, the majority of methods performing node- and edge-level tasks in machine learning do not take these properties into account, and thus have not been leveraged effectively for scientific tasks such as cellular signaling network inference. We propose a new framework called Directed Scattering Autoencoder (DSAE) which uses a directed version of a geometric scattering transform, combined with the non-linear dimensionality reduction properties of an autoencoder and the geometric properties of the hyperbolic space to learn latent hierarchies. We show this method outperforms numerous others on tasks such as embedding directed graphs and learning cellular signaling networks.  ( 2 min )
    Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining. (arXiv:2309.07332v1 [cs.LG])
    Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subtyping breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6% and 89.0%). Our method offers the potential to substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data.  ( 3 min )
    Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks. (arXiv:2309.07794v1 [cs.CL])
    Effectively leveraging multimodal information from social media posts is essential to various downstream tasks such as sentiment analysis, sarcasm detection and hate speech classification. However, combining text and image information is challenging because of the idiosyncratic cross-modal semantics with hidden or complementary information present in matching image-text pairs. In this work, we aim to directly model this by proposing the use of two auxiliary losses jointly with the main task when fine-tuning any pre-trained multimodal model. Image-Text Contrastive (ITC) brings image-text representations of a post closer together and separates them from different posts, capturing underlying dependencies. Image-Text Matching (ITM) facilitates the understanding of semantic correspondence between images and text by penalizing unrelated pairs. We combine these objectives with five multimodal models, demonstrating consistent improvements across four popular social media datasets. Furthermore, through detailed analysis, we shed light on the specific scenarios and cases where each auxiliary task proves to be most effective.  ( 2 min )
    Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System. (arXiv:2307.16834v3 [cs.CV] UPDATED)
    Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jetson platform is one of the pioneers in offering optimal performance regarding energy efficiency and throughput in the execution of deep learning algorithms. Previously, most benchmarking analysis was based on 2D images with a single deep learning model for each comparison result. In this paper, we implement an end-to-end video-based crime-scene anomaly detection system inputting from surveillance videos and the system is deployed and completely operates on multiple Jetson edge devices (Nano, AGX Xavier, Orin Nano). The comparison analysis includes the integration of Torch-TensorRT as a software developer kit from NVIDIA for the model performance optimisation. The system is built based on the PySlowfast open-source project from Facebook as the coding template. The end-to-end system process comprises the videos from camera, data preprocessing pipeline, feature extractor and the anomaly detection. We provide the experience of an AI-based system deployment on various Jetson Edge devices with Docker technology. Regarding anomaly detectors, a weakly supervised video-based deep learning model called Robust Temporal Feature Magnitude Learning (RTFM) is applied in the system. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.  ( 3 min )
    Multi-step prediction of chlorophyll concentration based on Adaptive Graph-Temporal Convolutional Network with Series Decomposition. (arXiv:2309.07187v1 [cs.LG])
    Chlorophyll concentration can well reflect the nutritional status and algal blooms of water bodies, and is an important indicator for evaluating water quality. The prediction of chlorophyll concentration change trend is of great significance to environmental protection and aquaculture. However, there is a complex and indistinguishable nonlinear relationship between many factors affecting chlorophyll concentration. In order to effectively mine the nonlinear features contained in the data. This paper proposes a time-series decomposition adaptive graph-time convolutional network ( AGTCNSD ) prediction model. Firstly, the original sequence is decomposed into trend component and periodic component by moving average method. Secondly, based on the graph convolutional neural network, the water quality parameter data is modeled, and a parameter embedding matrix is defined. The idea of matrix decomposition is used to assign weight parameters to each node. The adaptive graph convolution learns the relationship between different water quality parameters, updates the state information of each parameter, and improves the learning ability of the update relationship between nodes. Finally, time dependence is captured by time convolution to achieve multi-step prediction of chlorophyll concentration. The validity of the model is verified by the water quality data of the coastal city Beihai. The results show that the prediction effect of this method is better than other methods. It can be used as a scientific resource for environmental management decision-making.  ( 3 min )
    Reasoning with Language Model Prompting: A Survey. (arXiv:2212.09597v7 [cs.CL] UPDATED)
    Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc. This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting. We introduce research works with comparisons and summaries and provide systematic resources to help beginners. We also discuss the potential reasons for emerging such reasoning abilities and highlight future research directions. Resources are available at https://github.com/zjunlp/Prompt4ReasoningPapers (updated periodically).  ( 2 min )
    Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision. (arXiv:2309.07601v1 [cs.CL])
    Credibility signals represent a wide range of heuristics that are typically used by journalists and fact-checkers to assess the veracity of online content. Automating the task of credibility signal extraction, however, is very challenging as it requires high-accuracy signal-specific extractors to be trained, while there are currently no sufficiently large datasets annotated with all credibility signals. This paper investigates whether large language models (LLMs) can be prompted effectively with a set of 18 credibility signals to produce weak labels for each signal. We then aggregate these potentially noisy labels using weak supervision in order to predict content veracity. We demonstrate that our approach, which combines zero-shot LLM credibility signal labeling and weak supervision, outperforms state-of-the-art classifiers on two misinformation datasets without using any ground-truth labels for training. We also analyse the contribution of the individual credibility signals towards predicting content veracity, which provides new valuable insights into their role in misinformation detection.  ( 2 min )
    Text Classification of Cancer Clinical Trial Eligibility Criteria. (arXiv:2309.07812v1 [cs.CL])
    Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility is stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yields the highest average performance across all criteria.  ( 2 min )
    Understanding Vector-Valued Neural Networks and Their Relationship with Real and Hypercomplex-Valued Neural Networks. (arXiv:2309.07716v1 [cs.LG])
    Despite the many successful applications of deep learning models for multidimensional signal and image processing, most traditional neural networks process data represented by (multidimensional) arrays of real numbers. The intercorrelation between feature channels is usually expected to be learned from the training data, requiring numerous parameters and careful training. In contrast, vector-valued neural networks are conceived to process arrays of vectors and naturally consider the intercorrelation between feature channels. Consequently, they usually have fewer parameters and often undergo more robust training than traditional neural networks. This paper aims to present a broad framework for vector-valued neural networks, referred to as V-nets. In this context, hypercomplex-valued neural networks are regarded as vector-valued models with additional algebraic properties. Furthermore, this paper explains the relationship between vector-valued and traditional neural networks. Precisely, a vector-valued neural network can be obtained by placing restrictions on a real-valued model to consider the intercorrelation between feature channels. Finally, we show how V-nets, including hypercomplex-valued neural networks, can be implemented in current deep-learning libraries as real-valued networks.  ( 2 min )
    Entropy-based machine learning model for diagnosis and monitoring of Parkinson's Disease in smart IoT environment. (arXiv:2309.07134v1 [eess.SP])
    The study presents the concept of a computationally efficient machine learning (ML) model for diagnosing and monitoring Parkinson's disease (PD) in an Internet of Things (IoT) environment using rest-state EEG signals (rs-EEG). We computed different types of entropy from EEG signals and found that Fuzzy Entropy performed the best in diagnosing and monitoring PD using rs-EEG. We also investigated different combinations of signal frequency ranges and EEG channels to accurately diagnose PD. Finally, with a fewer number of features (11 features), we achieved a maximum classification accuracy (ARKF) of ~99.9%. The most prominent frequency range of EEG signals has been identified, and we have found that high classification accuracy depends on low-frequency signal components (0-4 Hz). Moreover, the most informative signals were mainly received from the right hemisphere of the head (F8, P8, T8, FC6). Furthermore, we assessed the accuracy of the diagnosis of PD using three different lengths of EEG data (150-1000 samples). Because the computational complexity is reduced by reducing the input data. As a result, we have achieved a maximum mean accuracy of 99.9% for a sample length (LEEG) of 1000 (~7.8 seconds), 98.2% with a LEEG of 800 (~6.2 seconds), and 79.3% for LEEG = 150 (~1.2 seconds). By reducing the number of features and segment lengths, the computational cost of classification can be reduced. Lower-performance smart ML sensors can be used in IoT environments for enhances human resilience to PD.  ( 3 min )
    Random Feature Amplification: Feature Learning and Generalization in Neural Networks. (arXiv:2202.07626v4 [cs.LG] UPDATED)
    In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.  ( 2 min )
    deepFDEnet: A Novel Neural Network Architecture for Solving Fractional Differential Equations. (arXiv:2309.07684v1 [cs.LG])
    The primary goal of this research is to propose a novel architecture for a deep neural network that can solve fractional differential equations accurately. A Gaussian integration rule and a $L_1$ discretization technique are used in the proposed design. In each equation, a deep neural network is used to approximate the unknown function. Three forms of fractional differential equations have been examined to highlight the method's versatility: a fractional ordinary differential equation, a fractional order integrodifferential equation, and a fractional order partial differential equation. The results show that the proposed architecture solves different forms of fractional differential equations with excellent precision.  ( 2 min )
    Interpretable Weighted Siamese Network to Predict the Time to Onset of Alzheimer's Disease from MRI Images. (arXiv:2304.07097v2 [eess.IV] UPDATED)
    Alzheimer's Disease (AD) is a progressive disease preceded by Mild Cognitive Impairment (MCI). Early detection of AD is crucial for making treatment decisions. However, most of the literature on computer-assisted detection of AD focuses on classifying brain images into one of three major categories: healthy, MCI, and AD; or categorizing MCI patients into (1) progressive: those who progress from MCI to AD at a future examination time, and (2) stable: those who stay as MCI and never progress to AD. This misses the opportunity to accurately identify the trajectory of progressive MCI patients. In this paper, we revisit the brain image classification task for AD identification and re-frame it as an ordinal classification task to predict how close a patient is to the severe AD stage. To this end, we select progressive MCI patients from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and construct an ordinal dataset with a prediction target that indicates the time to progression to AD. We train a Siamese network model to predict the time to onset of AD based on MRI brain images. We also propose a Weighted variety of Siamese network and compare its performance to a baseline model. Our evaluations show that incorporating a weighting factor to Siamese networks brings considerable performance gain at predicting how close input brain MRI images are to progressing to AD. Moreover, we complement our results with an interpretation of the learned embedding space of the Siamese networks using a model explainability technique.  ( 3 min )
    Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning. (arXiv:2211.10851v3 [cs.AI] UPDATED)
    Reinforcement Learning views the maximization of rewards and avoidance of punishments as central to explaining goal-directed behavior. However, over a life, organisms will need to learn about many different aspects of the world's structure: the states of the world and state-vector transition dynamics. The number of combinations of states grows exponentially as an agent incorporates new knowledge, and there is no obvious weighted combination of pre-existing rewards or costs defined for a given combination of states, as such a weighting would need to encode information about good and bad combinations prior to an agent's experience in the world. Therefore, we must develop more naturalistic accounts of behavior and motivation in large state-spaces. We show that it is possible to use only the intrinsic motivation metric of empowerment, which measures the agent's capacity to realize many possible futures under a transition operator. We propose to scale empowerment to hierarchical state-spaces by using Operator Bellman Equations. These equations produce state-time feasibility functions, which are compositional hierarchical state-time transition operators that map an initial state and time when an agent begins a policy to the final states and times of completing a goal. Because these functions are hierarchical operators we can define hierarchical empowerment measures on them. An agent can then optimize plans to distant states and times to maximize its hierarchical empowerment-gain, allowing it to discover goals that bring about a more favorable coupling of its internal structure (physiological states) to its external environment (world structure & spatial state). Life-long agents could therefore be primarily animated by principles of compositionality and empowerment, exhibiting self-concern for the growth & maintenance of their own structural integrity without recourse to reward-maximization.  ( 3 min )
    Masked Transformer for Electrocardiogram Classification. (arXiv:2309.07136v1 [eess.SP])
    Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformers for ECG data is not yet realized, despite their widespread success in computer vision and natural language processing. In this work, we present a useful masked Transformer method for ECG classification referred to as MTECG, which expands the application of masked autoencoders to ECG time series. We construct a dataset comprising 220,251 ECG recordings with a broad range of diagnoses annoated by medical experts to explore the properties of MTECG. Under the proposed training strategies, a lightweight model with 5.7M parameters performs stably well on a broad range of masking ratios (5%-75%). The ablation studies highlight the importance of fluctuated reconstruction targets, training schedule length, layer-wise LR decay and DropPath rate. The experiments on both private and public ECG datasets demonstrate that MTECG-T significantly outperforms the recent state-of-the-art algorithms in ECG classification.  ( 2 min )
    Structure-Preserving Transformers for Sequences of SPD Matrices. (arXiv:2309.07579v1 [cs.LG])
    In recent years, Transformer-based auto-attention mechanisms have been successfully applied to the analysis of a variety of context-reliant data types, from texts to images and beyond, including data from non-Euclidean geometries. In this paper, we present such a mechanism, designed to classify sequences of Symmetric Positive Definite matrices while preserving their Riemannian geometry throughout the analysis. We apply our method to automatic sleep staging on timeseries of EEG-derived covariance matrices from a standard dataset, obtaining high levels of stage-wise performance.
    A Novel Local-Global Feature Fusion Framework for Body-weight Exercise Recognition with Pressure Mapping Sensors. (arXiv:2309.07888v1 [cs.CV])
    We present a novel local-global feature fusion framework for body-weight exercise recognition with floor-based dynamic pressure maps. One step further from the existing studies using deep neural networks mainly focusing on global feature extraction, the proposed framework aims to combine local and global features using image processing techniques and the YOLO object detection to localize pressure profiles from different body parts and consider physical constraints. The proposed local feature extraction method generates two sets of high-level local features consisting of cropped pressure mapping and numerical features such as angular orientation, location on the mat, and pressure area. In addition, we adopt a knowledge distillation for regularization to preserve the knowledge of the global feature extraction and improve the performance of the exercise recognition. Our experimental results demonstrate a notable 11 percent improvement in F1 score for exercise recognition while preserving label-specific features.  ( 2 min )
    Deep reinforced learning heuristic tested on spin-glass ground states: The larger picture. (arXiv:2302.10848v2 [cond-mat.dis-nn] UPDATED)
    In Changjun Fan et al. [Nature Communications https://doi.org/10.1038/s41467-023-36363-w (2023)], the authors present a deep reinforced learning approach to augment combinatorial optimization heuristics. In particular, they present results for several spin glass ground state problems, for which instances on non-planar networks are generally NP-hard, in comparison with several Monte Carlo based methods, such as simulated annealing (SA) or parallel tempering (PT). Indeed, those results demonstrate that the reinforced learning improves the results over those obtained with SA or PT, or at least allows for reduced runtimes for the heuristics before results of comparable quality have been obtained relative to those other methods. To facilitate the conclusion that their method is ''superior'', the authors pursue two basic strategies: (1) A commercial GUROBI solver is called on to procure a sample of exact ground states as a testbed to compare with, and (2) a head-to-head comparison between the heuristics is given for a sample of larger instances where exact ground states are hard to ascertain. Here, we put these studies into a larger context, showing that the claimed superiority is at best marginal for smaller samples and becomes essentially irrelevant with respect to any sensible approximation of true ground states in the larger samples. For example, this method becomes irrelevant as a means to determine stiffness exponents $\theta$ in $d>2$, as mentioned by the authors, where the problem is not only NP-hard but requires the subtraction of two almost equal ground-state energies and systemic errors in each of $\approx 1\%$ found here are unacceptable. This larger picture on the method arises from a straightforward finite-size corrections study over the spin glass ensembles the authors employ, using data that has been available for decades.
    DoRA: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal. (arXiv:2309.00855v3 [cs.LG] UPDATED)
    The marketplace system connecting demands and supplies has been explored to develop unbiased decision-making in valuing properties. Real estate appraisal serves as one of the high-cost property valuation tasks for financial institutions since it requires domain experts to appraise the estimation based on the corresponding knowledge and the judgment of the market. Existing automated valuation models reducing the subjectivity of domain experts require a large number of transactions for effective evaluation, which is predominantly limited to not only the labeling efforts of transactions but also the generalizability of new developing and rural areas. To learn representations from unlabeled real estate sets, existing self-supervised learning (SSL) for tabular data neglects various important features, and fails to incorporate domain knowledge. In this paper, we propose DoRA, a Domain-based self-supervised learning framework for low-resource Real estate Appraisal. DoRA is pre-trained with an intra-sample geographic prediction as the pretext task based on the metadata of the real estate for equipping the real estate representations with prior domain knowledge. Furthermore, inter-sample contrastive learning is employed to generalize the representations to be robust for limited transactions of downstream tasks. Our benchmark results on three property types of real-world transactions show that DoRA significantly outperforms the SSL baselines for tabular data, the graph-based methods, and the supervised approaches in the few-shot scenarios by at least 7.6% for MAPE, 11.59% for MAE, and 3.34% for HR10%. We expect DoRA to be useful to other financial practitioners with similar marketplace applications who need general models for properties that are newly built and have limited records. The source code is available at https://github.com/wwweiwei/DoRA.
    Evaluation of Parameter-based Attacks against Embedded Neural Networks with Laser Injection. (arXiv:2304.12876v2 [cs.CR] UPDATED)
    Upcoming certification actions related to the security of machine learning (ML) based systems raise major evaluation challenges that are amplified by the large-scale deployment of models in many hardware platforms. Until recently, most of research works focused on API-based attacks that consider a ML model as a pure algorithmic abstraction. However, new implementation-based threats have been revealed, emphasizing the urgency to propose both practical and simulation-based methods to properly evaluate the robustness of models. A major concern is parameter-based attacks (such as the Bit-Flip Attack, BFA) that highlight the lack of robustness of typical deep neural network models when confronted by accurate and optimal alterations of their internal parameters stored in memory. Setting in a security testing purpose, this work practically reports, for the first time, a successful variant of the BFA on a 32-bit Cortex-M microcontroller using laser fault injection. It is a standard fault injection means for security evaluation, that enables to inject spatially and temporally accurate faults. To avoid unrealistic brute-force strategies, we show how simulations help selecting the most sensitive set of bits from the parameters taking into account the laser fault model.
    Learning nonparametric DAGs with incremental information via high-order HSIC. (arXiv:2308.05969v2 [cs.LG] UPDATED)
    Score-based methods for learning Bayesain networks(BN) aim to maximizing the global score functions. However, if local variables have direct and indirect dependence simultaneously, the global optimization on score functions misses edges between variables with indirect dependent relationship, of which scores are smaller than those with direct dependent relationship. In this paper, we present an identifiability condition based on a determined subset of parents to identify the underlying DAG. By the identifiability condition, we develop a two-phase algorithm namely optimal-tuning (OT) algorithm to locally amend the global optimization. In the optimal phase, an optimization problem based on first-order Hilbert-Schmidt independence criterion (HSIC) gives an estimated skeleton as the initial determined parents subset. In the tuning phase, the skeleton is locally tuned by deletion, addition and DAG-formalization strategies using the theoretically proved incremental properties of high-order HSIC. Numerical experiments for different synthetic datasets and real-world datasets show that the OT algorithm outperforms existing methods. Especially in Sigmoid Mix model with the size of the graph being ${\rm\bf d=40}$, the structure intervention distance (SID) of the OT algorithm is 329.7 smaller than the one obtained by CAM, which indicates that the graph estimated by the OT algorithm misses fewer edges compared with CAM.Source code of the OT algorithm is available at https://github.com/YafeiannWang/optimal-tune-algorithm.
    TensorFlow Chaotic Prediction and Blow Up. (arXiv:2309.07450v1 [cs.LG])
    Predicting the dynamics of chaotic systems is one of the most challenging tasks for neural networks, and machine learning in general. Here we aim to predict the spatiotemporal chaotic dynamics of a high-dimensional non-linear system. In our attempt we use the TensorFlow library, representing the state of the art for deep neural networks training and prediction. While our results are encouraging, and show that the dynamics of the considered system can be predicted for short time, we also indirectly discovered an unexpected and undesirable behavior of the TensorFlow library. More specifically, the longer term prediction of the system's chaotic behavior quickly deteriorates and blows up due to the nondeterministic behavior of the TensorFlow library. Here we provide numerical evidence of the short time prediction ability, and of the longer term predictability blow up.
    Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense. (arXiv:2309.06724v2 [cs.CV] UPDATED)
    We aim to provide a general framework of for computational photography that recovers the real scene from imperfect images, via the Deep Nonparametric Convexified Filtering (DNCF). It is consists of a nonparametric deep network to resemble the physical equations behind the image formation, such as denoising, super-resolution, inpainting, and flash. DNCF has no parameterization dependent on training data, therefore has a strong generalization and robustness to adversarial image manipulation. During inference, we also encourage the network parameters to be nonnegative and create a bi-convex function on the input and parameters, and this adapts to second-order optimization algorithms with insufficient running time, having 10X acceleration over Deep Image Prior. With these tools, we empirically verify its capability to defend image classification deep networks against adversary attack algorithms in real-time.
    Autotuning Apache TVM-based Scientific Applications Using Bayesian Optimization. (arXiv:2309.07235v1 [cs.LG])
    Apache TVM (Tensor Virtual Machine), an open source machine learning compiler framework designed to optimize computations across various hardware platforms, provides an opportunity to improve the performance of dense matrix factorizations such as LU (Lower Upper) decomposition and Cholesky decomposition on GPUs and AI (Artificial Intelligence) accelerators. In this paper, we propose a new TVM autotuning framework using Bayesian Optimization and use the TVM tensor expression language to implement linear algebra kernels such as LU, Cholesky, and 3mm. We use these scientific computation kernels to evaluate the effectiveness of our methods on a GPU cluster, called Swing, at Argonne National Laboratory. We compare the proposed autotuning framework with the TVM autotuning framework AutoTVM with four tuners and find that our framework outperforms AutoTVM in most cases.
    Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning. (arXiv:2309.07742v1 [cs.LG])
    Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post-hoc explainers and concept-based neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: a representation is understandable only insofar as it can be understood by the human at the receiving end. The key challenge in Human-interpretable Representation Learning (HRL) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks. Our formalization of HRL builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us to derive a principled notion of alignment between the machine representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive name transfer game, and clarify the relationship between alignment and a well-known property of representations, namely disentanglment. We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as concept leakage, and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.
    Learning to Warm-Start Fixed-Point Optimization Algorithms. (arXiv:2309.07835v1 [math.OC])
    We introduce a machine-learning framework to warm-start fixed-point optimization algorithms. Our architecture consists of a neural network mapping problem parameters to warm starts, followed by a predefined number of fixed-point iterations. We propose two loss functions designed to either minimize the fixed-point residual or the distance to a ground truth solution. In this way, the neural network predicts warm starts with the end-to-end goal of minimizing the downstream loss. An important feature of our architecture is its flexibility, in that it can predict a warm start for fixed-point algorithms run for any number of steps, without being limited to the number of steps it has been trained on. We provide PAC-Bayes generalization bounds on unseen data for common classes of fixed-point operators: contractive, linearly convergent, and averaged. Applying this framework to well-known applications in control, statistics, and signal processing, we observe a significant reduction in the number of iterations and solution time required to solve these problems, through learned warm starts.
    On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems. (arXiv:1906.00331v9 [cs.LG] UPDATED)
    We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set. One of the most popular algorithms for solving this problem is the celebrated gradient descent ascent (GDA) algorithm, which has been widely used in machine learning, control theory and economics. Despite the extensive convergence results for the convex-concave setting, GDA with equal stepsize can converge to limit cycles or even diverge in a general setting. In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. To the best our knowledge, this is the first nonasymptotic analysis for two-time-scale GDA in this setting, shedding light on its superior practical performance in training generative adversarial networks (GANs) and other real applications.
    Causal Entropy and Information Gain for Measuring Causal Control. (arXiv:2309.07703v1 [cs.LG])
    Artificial intelligence models and methods commonly lack causal interpretability. Despite the advancements in interpretable machine learning (IML) methods, they frequently assign importance to features which lack causal influence on the outcome variable. Selecting causally relevant features among those identified as relevant by these methods, or even before model training, would offer a solution. Feature selection methods utilizing information theoretical quantities have been successful in identifying statistically relevant features. However, the information theoretical quantities they are based on do not incorporate causality, rendering them unsuitable for such scenarios. To address this challenge, this article proposes information theoretical quantities that incorporate the causal structure of the system, which can be used to evaluate causal importance of features for some given outcome variable. Specifically, we introduce causal versions of entropy and mutual information, termed causal entropy and causal information gain, which are designed to assess how much control a feature provides over the outcome variable. These newly defined quantities capture changes in the entropy of a variable resulting from interventions on other variables. Fundamental results connecting these quantities to the existence of causal effects are derived. The use of causal information gain in feature selection is demonstrated, highlighting its superiority over standard mutual information in revealing which features provide control over a chosen outcome variable. Our investigation paves the way for the development of methods with improved interpretability in domains involving causation.
    Improving physics-informed DeepONets with hard constraints. (arXiv:2309.07899v1 [cs.LG])
    Current physics-informed (standard or operator) neural networks still rely on accurately learning the initial conditions of the system they are solving. In contrast, standard numerical methods evolve such initial conditions without needing to learn these. In this study, we propose to improve current physics-informed deep learning strategies such that initial conditions do not need to be learned and are represented exactly in the predicted solution. Moreover, this method guarantees that when a DeepONet is applied multiple times to time step a solution, the resulting function is continuous.
    Identifying the Group-Theoretic Structure of Machine-Learned Symmetries. (arXiv:2309.07860v1 [hep-ph])
    Deep learning was recently successfully used in deriving symmetry transformations that preserve important physics quantities. Being completely agnostic, these techniques postpone the identification of the discovered symmetries to a later stage. In this letter we propose methods for examining and identifying the group-theoretic structure of such machine-learned symmetries. We design loss functions which probe the subalgebra structure either during the deep learning stage of symmetry discovery or in a subsequent post-processing stage. We illustrate the new methods with examples from the U(n) Lie group family, obtaining the respective subalgebra decompositions. As an application to particle physics, we demonstrate the identification of the residual symmetries after the spontaneous breaking of non-Abelian gauge symmetries like SU(3) and SU(5) which are commonly used in model building.
    A DenseNet-based method for decoding auditory spatial attention with EEG. (arXiv:2309.07690v1 [eess.SP])
    Auditory spatial attention detection (ASAD) aims to decode the attended spatial location with EEG in a multiple-speaker setting. ASAD methods are inspired by the brain lateralization of cortical neural responses during the processing of auditory spatial attention, and show promising performance for the task of auditory attention decoding (AAD) with neural recordings. In the previous ASAD methods, the spatial distribution of EEG electrodes is not fully exploited, which may limit the performance of these methods. In the present work, by transforming the original EEG channels into a two-dimensional (2D) spatial topological map, the EEG data is transformed into a three-dimensional (3D) arrangement containing spatial-temporal information. And then a 3D deep convolutional neural network (DenseNet-3D) is used to extract temporal and spatial features of the neural representation for the attended locations. The results show that the proposed method achieves higher decoding accuracy than the state-of-the-art (SOTA) method (94.4% compared to XANet's 90.6%) with 1-second decision window for the widely used KULeuven (KUL) dataset, and the code to implement our work is available on Github: https://github.com/xuxiran/ASAD_DenseNet
    Pareto Adversarial Robustness: Balancing Spatial Robustness and Sensitivity-based Robustness. (arXiv:2111.01996v2 [cs.LG] UPDATED)
    Adversarial robustness, which primarily comprises sensitivity-based robustness and spatial robustness, plays an integral part in achieving robust generalization. In this paper, we endeavor to design strategies to achieve universal adversarial robustness. To achieve this, we first investigate the relatively less-explored realm of spatial robustness. Then, we integrate the existing spatial robustness methods by incorporating both local and global spatial vulnerability into a unified spatial attack and adversarial training approach. Furthermore, we present a comprehensive relationship between natural accuracy, sensitivity-based robustness, and spatial robustness, supported by strong evidence from the perspective of robust representation. Crucially, to reconcile the interplay between the mutual impacts of various robustness components into one unified framework, we incorporate the \textit{Pareto criterion} into the adversarial robustness analysis, yielding a novel strategy called Pareto Adversarial Training for achieving universal robustness. The resulting Pareto front, which delineates the set of optimal solutions, provides an optimal balance between natural accuracy and various adversarial robustness. This sheds light on solutions for achieving universal robustness in the future. To the best of our knowledge, we are the first to consider universal adversarial robustness via multi-objective optimization.
    Goal Space Abstraction in Hierarchical Reinforcement Learning via Set-Based Reachability Analysis. (arXiv:2309.07675v1 [cs.LG])
    Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this paper, we propose a developmental mechanism for goal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We introduce a Feudal HRL algorithm that concurrently learns both the goal representation and a hierarchical policy. The algorithm uses symbolic reachability analysis for neural networks to approximate the transition relation among sets of states and to refine the goal representation. We evaluate our approach on complex navigation tasks, showing the learned representation is interpretable, transferrable and results in data efficient learning.
    Semi-supervised Domain Adaptation on Graphs with Contrastive Learning and Minimax Entropy. (arXiv:2309.07402v1 [cs.LG])
    Label scarcity in a graph is frequently encountered in real-world applications due to the high cost of data labeling. To this end, semi-supervised domain adaptation (SSDA) on graphs aims to leverage the knowledge of a labeled source graph to aid in node classification on a target graph with limited labels. SSDA tasks need to overcome the domain gap between the source and target graphs. However, to date, this challenging research problem has yet to be formally considered by the existing approaches designed for cross-graph node classification. To tackle the SSDA problem on graphs, a novel method called SemiGCL is proposed, which benefits from graph contrastive learning and minimax entropy training. SemiGCL generates informative node representations by contrasting the representations learned from a graph's local and global views. Additionally, SemiGCL is adversarially optimized with the entropy loss of unlabeled target nodes to reduce domain divergence. Experimental results on benchmark datasets demonstrate that SemiGCL outperforms the state-of-the-art baselines on the SSDA tasks.
    Beta Diffusion. (arXiv:2309.07867v1 [cs.LG])
    We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges. Using scaled and shifted beta distributions, beta diffusion utilizes multiplicative transitions over time to create both forward and reverse diffusion processes, maintaining beta distributions in both the forward marginals and the reverse conditionals, given the data at any point in time. Unlike traditional diffusion-based generative models relying on additive Gaussian noise and reweighted evidence lower bounds (ELBOs), beta diffusion is multiplicative and optimized with KL-divergence upper bounds (KLUBs) derived from the convexity of the KL divergence. We demonstrate that the proposed KLUBs are more effective for optimizing beta diffusion compared to negative ELBOs, which can also be derived as the KLUBs of the same KL divergence with its two arguments swapped. The loss function of beta diffusion, expressed in terms of Bregman divergence, further supports the efficacy of KLUBs for optimization. Experimental results on both synthetic data and natural images demonstrate the unique capabilities of beta diffusion in generative modeling of range-bounded data and validate the effectiveness of KLUBs in optimizing diffusion models, thereby making them valuable additions to the family of diffusion-based generative models and the optimization techniques used to train them.
    Conformal Regression in Calorie Prediction for Team Jumbo-Visma. (arXiv:2304.03778v3 [cs.LG] UPDATED)
    UCI WorldTour races, the premier men's elite road cycling tour, are grueling events that put physical fitness and endurance of riders to the test. The coaches of Team Jumbo-Visma have long been responsible for predicting the energy needs of each rider of the Dutch team for every race on the calendar. Those must be estimated to ensure riders have the energy and resources necessary to maintain a high level of performance throughout a race. This task, however, is both time-consuming and challenging, as it requires precise estimates of race speed and power output. Traditionally, the approach to predicting energy needs has relied on judgement and experience of coaches, but this method has its limitations and often leads to inaccurate predictions. In this paper, we propose a new, more effective approach to predicting energy needs for cycling races. By predicting the speed and power with regression models, we provide the coaches with calorie needs estimates for each individual rider per stage instantly. In addition, we compare methods to quantify uncertainty using conformal prediction. The empirical analysis of the jackknife+, jackknife-minmax, jackknife-minmax-after-bootstrap, CV+, CV-minmax, conformalized quantile regression, and inductive conformal prediction methods in conformal prediction reveals that all methods achieve valid prediction intervals. All but minmax-based methods also produce sufficiently narrow prediction intervals for decision-making. Furthermore, methods computing prediction intervals of fixed size produce tighter intervals for low significance values. Among the methods computing intervals of varying length across the input space, inductive conformal prediction computes narrower prediction intervals at larger significance level.
    Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning. (arXiv:2309.07666v1 [cs.LG])
    In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD. To solve it, we adapt previous works in the MSDA literature, such as Wasserstein Barycenter Transport and Dataset Dictionary Learning, as well as DD method Distribution Matching. We thoroughly experiment with this novel problem on four benchmarks (Caltech-Office 10, Tennessee-Eastman Process, Continuous Stirred Tank Reactor, and Case Western Reserve University), where we show that, even with as little as 1 sample per class, one achieves state-of-the-art adaptation performance.
    Optimal transport distances for directed, weighted graphs: a case study with cell-cell communication networks. (arXiv:2309.07030v2 [cs.LG] UPDATED)
    Comparing graphs by means of optimal transport has recently gained significant attention, as the distances induced by optimal transport provide both a principled metric between graphs as well as an interpretable description of the associated changes between graphs in terms of a transport plan. As the lack of symmetry introduces challenges in the typically considered formulations, optimal transport distances for graphs have mostly been developed for undirected graphs. Here, we propose two distance measures to compare directed graphs based on variants of optimal transport: (i) an earth movers distance (Wasserstein) and (ii) a Gromov-Wasserstein (GW) distance. We evaluate these two distances and discuss their relative performance for both simulated graph data and real-world directed cell-cell communication graphs, inferred from single-cell RNA-seq data.
    Preserved Edge Convolutional Neural Network for Sensitivity Enhancement of Deuterium Metabolic Imaging (DMI). (arXiv:2309.04100v2 [eess.IV] UPDATED)
    Purpose: Common to most MRSI techniques, the spatial resolution and the minimal scan duration of Deuterium Metabolic Imaging (DMI) are limited by the achievable SNR. This work presents a deep learning method for sensitivity enhancement of DMI. Methods: A convolutional neural network (CNN) was designed to estimate the 2H-labeled metabolite concentrations from low SNR and distorted DMI FIDs. The CNN was trained with synthetic data that represent a range of SNR levels typically encountered in vivo. The estimation precision was further improved by fine-tuning the CNN with MRI-based edge-preserving regularization for each DMI dataset. The proposed processing method, PReserved Edge ConvolutIonal neural network for Sensitivity Enhanced DMI (PRECISE-DMI), was applied to simulation studies and in vivo experiments to evaluate the anticipated improvements in SNR and investigate the potential for inaccuracies. Results: PRECISE-DMI visually improved the metabolic maps of low SNR datasets, and quantitatively provided higher precision than the standard Fourier reconstruction. Processing of DMI data acquired in rat brain tumor models resulted in more precise determination of 2H-labeled lactate and glutamate + glutamine levels, at increased spatial resolution (from >8 to 2 $\mu$L) or shortened scan time (from 32 to 4 min) compared to standard acquisitions. However, rigorous SD-bias analyses showed that overuse of the edge-preserving regularization can compromise the accuracy of the results. Conclusion: PRECISE-DMI allows a flexible trade-off between enhancing the sensitivity of DMI and minimizing the inaccuracies. With typical settings, the DMI sensitivity can be improved by 3-fold while retaining the capability to detect local signal variations.
    Massively-Parallel Heat Map Sorting and Applications To Explainable Clustering. (arXiv:2309.07486v1 [cs.DS])
    Given a set of points labeled with $k$ labels, we introduce the heat map sorting problem as reordering and merging the points and dimensions while preserving the clusters (labels). A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged. We prove the problem is NP-hard and we give a fixed-parameter algorithm with a constant number of rounds in the massively parallel computation model, where each machine has a sublinear memory and the total memory of the machines is linear. We give an approximation algorithm for a NP-hard special case of the problem. We empirically compare our algorithm with k-means and density-based clustering (DBSCAN) using a dimensionality reduction via locality-sensitive hashing on several directed and undirected graphs of email and computer networks.
    Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models. (arXiv:2211.02048v4 [cs.CV] UPDATED)
    During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users prone to gradually edit the input image. This motivates us to cache and reuse the feature maps of the original image. Given an edited image, we sparsely apply the convolutional filters to the edited regions while reusing the cached features for the unedited areas. Based on our algorithm, we further propose Sparse Incremental Generative Engine (SIGE) to convert the computation reduction to latency reduction on off-the-shelf hardware. With about $1\%$-area edits, SIGE accelerates DDPM by $3.0\times$ on NVIDIA RTX 3090 and $4.6\times$ on Apple M1 Pro GPU, Stable Diffusion by $7.2\times$ on 3090, and GauGAN by $5.6\times$ on 3090 and $5.2\times$ on M1 Pro GPU. Compared to our conference version, we extend SIGE to accommodate attention layers and apply it to Stable Diffusion. Additionally, we offer support for Apple M1 Pro GPU and include more results with large and sequential edits.
    EnCodecMAE: Leveraging neural codecs for universal audio representation learning. (arXiv:2309.07391v1 [cs.SD])
    The goal of universal audio representation learning is to obtain foundational models that can be used for a variety of downstream tasks involving speech, music or environmental sounds. To approach this problem, methods inspired by self-supervised models from NLP, like BERT, are often used and adapted to audio. These models rely on the discrete nature of text, hence adopting this type of approach for audio processing requires either a change in the learning objective or mapping the audio signal to a set of discrete classes. In this work, we explore the use of EnCodec, a neural audio codec, to generate discrete targets for learning an universal audio model based on a masked autoencoder (MAE). We evaluate this approach, which we call EncodecMAE, on a wide range of audio tasks spanning speech, music and environmental sounds, achieving performances comparable or better than leading audio representation models.
    Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data. (arXiv:2202.05928v4 [cs.LG] UPDATED)
    Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
    Simultaneous inference for generalized linear models with unmeasured confounders. (arXiv:2309.07261v1 [stat.ME])
    Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.
    DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection. (arXiv:2309.07147v1 [eess.SP])
    Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0\% and 79.6\% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times.
    Systematic Review of Experimental Paradigms and Deep Neural Networks for Electroencephalography-Based Cognitive Workload Detection. (arXiv:2309.07163v1 [eess.SP])
    This article summarizes a systematic review of the electroencephalography (EEG)-based cognitive workload (CWL) estimation. The focus of the article is twofold: identify the disparate experimental paradigms used for reliably eliciting discreet and quantifiable levels of cognitive load and the specific nature and representational structure of the commonly used input formulations in deep neural networks (DNNs) used for signal classification. The analysis revealed a number of studies using EEG signals in its native representation of a two-dimensional matrix for offline classification of CWL. However, only a few studies adopted an online or pseudo-online classification strategy for real-time CWL estimation. Further, only a couple of interpretable DNNs and a single generative model were employed for cognitive load detection till date during this review. More often than not, researchers were using DNNs as black-box type models. In conclusion, DNNs prove to be valuable tools for classifying EEG signals, primarily due to the substantial modeling power provided by the depth of their network architecture. It is further suggested that interpretable and explainable DNN models must be employed for cognitive workload estimation since existing methods are limited in the face of the non-stationary nature of the signal.
    The Grand Illusion: The Myth of Software Portability and Implications for ML Progress. (arXiv:2309.07181v1 [cs.SE])
    Pushing the boundaries of machine learning often requires exploring different hardware and software combinations. However, the freedom to experiment across different tooling stacks can be at odds with the drive for efficiency, which has produced increasingly specialized AI hardware and incentivized consolidation around a narrow set of ML frameworks. Exploratory research can be restricted if software and hardware are co-evolving, making it even harder to stray away from mainstream ideas that work well with popular tooling stacks. While this friction increasingly impacts the rate of innovation in machine learning, to our knowledge the lack of portability in tooling has not been quantified. In this work, we ask: How portable are popular ML software frameworks? We conduct a large-scale study of the portability of mainstream ML frameworks across different hardware types. Our findings paint an uncomfortable picture -- frameworks can lose more than 40% of their key functions when ported to other hardware. Worse, even when functions are portable, the slowdown in their performance can be extreme and render performance untenable. Collectively, our results reveal how costly straying from a narrow set of hardware-software combinations can be - and suggest that specialization of hardware impedes innovation in machine learning research.
    The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection. (arXiv:2309.07192v1 [eess.IV])
    Machine Learning (ML) has emerged as a promising approach in healthcare, outperforming traditional statistical techniques. However, to establish ML as a reliable tool in clinical practice, adherence to best practices regarding data handling, experimental design, and model evaluation is crucial. This work summarizes and strictly observes such practices to ensure reproducible and reliable ML. Specifically, we focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. We investigate the impact of different data augmentation techniques and model complexity on the overall performance. We consider MRI data from ADNI dataset to address a classification problem employing 3D Convolutional Neural Network (CNN). The experiments are designed to compensate for data scarcity and initial random parameters by utilizing cross-validation and multiple training trials. Within this framework, we train 15 predictive models, considering three different data augmentation strategies and five distinct 3D CNN architectures, each varying in the number of convolutional layers. Specifically, the augmentation strategies are based on affine transformations, such as zoom, shift, and rotation, applied concurrently or separately. The combined effect of data augmentation and model complexity leads to a variation in prediction performance up to 10% of accuracy. When affine transformation are applied separately, the model is more accurate, independently from the adopted architecture. For all strategies, the model accuracy followed a concave behavior at increasing number of convolutional layers, peaking at an intermediate value of layers. The best model (8 CL, (B)) is the most stable across cross-validation folds and training trials, reaching excellent performance both on the testing set and on an external test set.
    Improved Auto-Encoding using Deterministic Projected Belief Networks. (arXiv:2309.07481v1 [cs.LG])
    In this paper, we exploit the unique properties of a deterministic projected belief network (D-PBN) to take full advantage of trainable compound activation functions (TCAs). A D-PBN is a type of auto-encoder that operates by "backing up" through a feed-forward neural network. TCAs are activation functions with complex monotonic-increasing shapes that change the distribution of the data so that the linear transformation that follows is more effective. Because a D-PBN operates by "backing up", the TCAs are inverted in the reconstruction process, restoring the original distribution of the data, thus taking advantage of a given TCA in both analysis and reconstruction. In this paper, we show that a D-PBN auto-encoder with TCAs can significantly out-perform standard auto-encoders including variational auto-encoders.
    Statistically Valid Variable Importance Assessment through Conditional Permutations. (arXiv:2309.07593v1 [cs.LG])
    Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An empirical benchmark on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods.
    Virchow: A Million-Slide Digital Pathology Foundation Model. (arXiv:2309.07778v1 [eess.IV])
    Computational pathology uses artificial intelligence to enable precision medicine and decision support systems through the analysis of whole slide images. It has the potential to revolutionize the diagnosis and treatment of cancer. However, a major challenge to this objective is that for many specific computational pathology tasks the amount of data is inadequate for development. To address this challenge, we created Virchow, a 632 million parameter deep neural network foundation model for computational pathology. Using self-supervised learning, Virchow is trained on 1.5 million hematoxylin and eosin stained whole slide images from diverse tissue groups, which is orders of magnitude more data than previous works. When evaluated on downstream tasks including tile-level pan-cancer detection and subtyping and slide-level biomarker prediction, Virchow outperforms state-of-the-art systems both on internal datasets drawn from the same population as the pretraining data as well as external public datasets. Virchow achieves 93% balanced accuracy for pancancer tile classification, and AUCs of 0.983 for colon microsatellite instability status prediction and 0.967 for breast CDH1 status prediction. The gains in performance highlight the importance of pretraining on massive pathology image datasets, suggesting pretraining on even larger datasets could continue improving performance for many high-impact applications where limited amounts of training data are available, such as drug outcome prediction.
    PRE: Vision-Language Prompt Learning with Reparameterization Encoder. (arXiv:2309.07760v1 [cs.CV])
    Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks. However, to attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions. This manual prompt engineering is the major challenge for deploying such models in practice since it requires domain expertise and is extremely time-consuming. To avoid non-trivial prompt engineering, recent work Context Optimization (CoOp) introduced the concept of prompt learning to the vision domain using learnable textual tokens. While CoOp can achieve substantial improvements over manual prompts, its learned context is worse generalizable to wider unseen classes within the same dataset. In this work, we present Prompt Learning with Reparameterization Encoder (PRE) - a simple and efficient method that enhances the generalization ability of the learnable prompt to unseen classes while maintaining the capacity to learn Base classes. Instead of directly optimizing the prompts, PRE employs a prompt encoder to reparameterize the input prompt embeddings, enhancing the exploration of task-specific knowledge from few-shot samples. Experiments and extensive ablation studies on 8 benchmarks demonstrate that our approach is an efficient method for prompt learning. Specifically, PRE achieves a notable enhancement of 5.60% in average accuracy on New classes and 3% in Harmonic mean compared to CoOp in the 16-shot setting, all achieved within a good training time.
    Advancing Regular Language Reasoning in Linear Recurrent Neural Networks. (arXiv:2309.07412v1 [cs.CL])
    In recent studies, linear recurrent neural networks (LRNNs) have achieved Transformer-level performance in natural language modeling and long-range modeling while offering rapid parallel training and constant inference costs. With the resurged interest in LRNNs, we study whether they can learn the hidden rules in training sequences, such as the grammatical structures of regular language. We theoretically analyze some existing LRNNs and discover their limitations on regular language. Motivated by the analysis, we propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix. Experiments suggest that the proposed model is the only LRNN that can perform length extrapolation on regular language tasks such as Sum, Even Pair, and Modular Arithmetic.
    Learning Beyond Similarities: Incorporating Dissimilarities between Positive Pairs in Self-Supervised Time Series Learning. (arXiv:2309.07526v1 [cs.LG])
    By identifying similarities between successive inputs, Self-Supervised Learning (SSL) methods for time series analysis have demonstrated their effectiveness in encoding the inherent static characteristics of temporal data. However, an exclusive emphasis on similarities might result in representations that overlook the dynamic attributes critical for modeling cardiovascular diseases within a confined subject cohort. Introducing Distilled Encoding Beyond Similarities (DEBS), this paper pioneers an SSL approach that transcends mere similarities by integrating dissimilarities among positive pairs. The framework is applied to electrocardiogram (ECG) signals, leading to a notable enhancement of +10\% in the detection accuracy of Atrial Fibrillation (AFib) across diverse subjects. DEBS underscores the potential of attaining a more refined representation by encoding the dynamic characteristics of time series data, tapping into dissimilarities during the optimization process. Broadly, the strategy delineated in this study holds the promise of unearthing novel avenues for advancing SSL methodologies tailored to temporal data.
    Unbiased Face Synthesis With Diffusion Models: Are We There Yet?. (arXiv:2309.07277v1 [cs.CV])
    Text-to-image diffusion models have achieved widespread popularity due to their unprecedented image generation capability. In particular, their ability to synthesize and modify human faces has spurred research into using generated face images in both training data augmentation and model performance assessments. In this paper, we study the efficacy and shortcomings of generative models in the context of face generation. Utilizing a combination of qualitative and quantitative measures, including embedding-based metrics and user studies, we present a framework to audit the characteristics of generated faces conditioned on a set of social attributes. We applied our framework on faces generated through state-of-the-art text-to-image diffusion models. We identify several limitations of face image generation that include faithfulness to the text prompt, demographic disparities, and distributional shifts. Furthermore, we present an analytical model that provides insights into how training data selection contributes to the performance of generative models.
    SC-MAD: Mixtures of Higher-order Networks for Data Augmentation. (arXiv:2309.07453v1 [stat.ML])
    The myriad complex systems with multiway interactions motivate the extension of graph-based pairwise connections to higher-order relations. In particular, the simplicial complex has inspired generalizations of graph neural networks (GNNs) to simplicial complex-based models. Learning on such systems requires large amounts of data, which can be expensive or impossible to obtain. We propose data augmentation of simplicial complexes through both linear and nonlinear mixup mechanisms that return mixtures of existing labeled samples. In addition to traditional pairwise mixup, we present a convex clustering mixup approach for a data-driven relationship among several simplicial complexes. We theoretically demonstrate that the resultant synthetic simplicial complexes interpolate among existing data with respect to homomorphism densities. Our method is demonstrated on both synthetic and real-world datasets for simplicial complex classification.
    Mitigating Adversarial Attacks in Federated Learning with Trusted Execution Environments. (arXiv:2309.07197v1 [cs.LG])
    The main premise of federated learning (FL) is that machine learning model updates are computed locally to preserve user data privacy. This approach avoids by design user data to ever leave the perimeter of their device. Once the updates aggregated, the model is broadcast to all nodes in the federation. However, without proper defenses, compromised nodes can probe the model inside their local memory in search for adversarial examples, which can lead to dangerous real-world scenarios. For instance, in image-based applications, adversarial examples consist of images slightly perturbed to the human eye getting misclassified by the local model. These adversarial images are then later presented to a victim node's counterpart model to replay the attack. Typical examples harness dissemination strategies such as altered traffic signs (patch attacks) no longer recognized by autonomous vehicles or seemingly unaltered samples that poison the local dataset of the FL scheme to undermine its robustness. Pelta is a novel shielding mechanism leveraging Trusted Execution Environments (TEEs) that reduce the ability of attackers to craft adversarial samples. Pelta masks inside the TEE the first part of the back-propagation chain rule, typically exploited by attackers to craft the malicious samples. We evaluate Pelta on state-of-the-art accurate models using three well-established datasets: CIFAR-10, CIFAR-100 and ImageNet. We show the effectiveness of Pelta in mitigating six white-box state-of-the-art adversarial attacks, such as Projected Gradient Descent, Momentum Iterative Method, Auto Projected Gradient Descent, the Carlini & Wagner attack. In particular, Pelta constitutes the first attempt at defending an ensemble model against the Self-Attention Gradient attack to the best of our knowledge. Our code is available to the research community at https://github.com/queyrusi/Pelta.
    Exploring Large Language Models for Ontology Alignment. (arXiv:2309.07172v1 [cs.AI])
    This work investigates the applicability of recent generative Large Language Models (LLMs), such as the GPT series and Flan-T5, to ontology alignment for identifying concept equivalence mappings across ontologies. To test the zero-shot performance of Flan-T5-XXL and GPT-3.5-turbo, we leverage challenging subsets from two equivalence matching datasets of the OAEI Bio-ML track, taking into account concept labels and structural contexts. Preliminary findings suggest that LLMs have the potential to outperform existing ontology alignment systems like BERTMap, given careful framework and prompt design.
    Frequency Convergence of Complexon Shift Operators. (arXiv:2309.07169v1 [eess.SP])
    Topological signal processing (TSP) utilizes simplicial complexes to model structures with higher order than vertices and edges. In this paper, we study the transferability of TSP via a generalized higher-order version of graphon, known as complexon. We recall the notion of a complexon as the limit of a simplicial complex sequence [1]. Inspired by the integral operator form of graphon shift operators, we construct a marginal complexon and complexon shift operator (CSO) according to components of all possible dimensions from the complexon. We investigate the CSO's eigenvalues and eigenvectors, and relate them to a new family of weighted adjacency matrices. We prove that when a simplicial complex sequence converges to a complexon, the eigenvalues of the corresponding CSOs converge to that of the limit complexon. These results hint at learning transferability on large simplicial complexes or simplicial complex sequences, which generalize the graphon signal processing framework.
    Audio-Based Classification of Respiratory Diseases using Advanced Signal Processing and Machine Learning for Assistive Diagnosis Support. (arXiv:2309.07183v1 [eess.SP])
    In global healthcare, respiratory diseases are a leading cause of mortality, underscoring the need for rapid and accurate diagnostics. To advance rapid screening techniques via auscultation, our research focuses on employing one of the largest publicly available medical database of respiratory sounds to train multiple machine learning models able to classify different health conditions. Our method combines Empirical Mode Decomposition (EMD) and spectral analysis to extract physiologically relevant biosignals from acoustic data, closely tied to cardiovascular and respiratory patterns, making our approach apart in its departure from conventional audio feature extraction practices. We use Power Spectral Density analysis and filtering techniques to select Intrinsic Mode Functions (IMFs) strongly correlated with underlying physiological phenomena. These biosignals undergo a comprehensive feature extraction process for predictive modeling. Initially, we deploy a binary classification model that demonstrates a balanced accuracy of 87% in distinguishing between healthy and diseased individuals. Subsequently, we employ a six-class classification model that achieves a balanced accuracy of 72% in diagnosing specific respiratory conditions like pneumonia and chronic obstructive pulmonary disease (COPD). For the first time, we also introduce regression models that estimate age and body mass index (BMI) based solely on acoustic data, as well as a model for gender classification. Our findings underscore the potential of this approach to significantly enhance assistive and remote diagnostic capabilities.
    Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch. (arXiv:2309.07909v1 [cs.LG])
    Unsupervised contrastive learning methods have recently seen significant improvements, particularly through data augmentation strategies that aim to produce robust and generalizable representations. However, prevailing data augmentation methods, whether hand designed or based on foundation models, tend to rely heavily on prior knowledge or external data. This dependence often compromises their effectiveness and efficiency. Furthermore, the applicability of most existing data augmentation strategies is limited when transitioning to other research domains, especially science-related data. This limitation stems from the paucity of prior knowledge and labeled data available in these domains. To address these challenges, we introduce DiffAug-a novel and efficient Diffusion-based data Augmentation technique. DiffAug aims to ensure that the augmented and original data share a smoothed latent space, which is achieved through diffusion steps. Uniquely, unlike traditional methods, DiffAug first mines sufficient prior semantic knowledge about the neighborhood. This provides a constraint to guide the diffusion steps, eliminating the need for labels, external data/models, or prior knowledge. Designed as an architecture-agnostic framework, DiffAug provides consistent improvements. Specifically, it improves image classification and clustering accuracy by 1.6%~4.5%. When applied to biological data, DiffAug improves performance by up to 10.1%, with an average improvement of 5.8%. DiffAug shows good performance in both vision and biological domains.
    Distribution Grid Line Outage Identification with Unknown Pattern and Performance Guarantee. (arXiv:2309.07157v1 [cs.LG])
    Line outage identification in distribution grids is essential for sustainable grid operation. In this work, we propose a practical yet robust detection approach that utilizes only readily available voltage magnitudes, eliminating the need for costly phase angles or power flow data. Given the sensor data, many existing detection methods based on change-point detection require prior knowledge of outage patterns, which are unknown for real-world outage scenarios. To remove this impractical requirement, we propose a data-driven method to learn the parameters of the post-outage distribution through gradient descent. However, directly using gradient descent presents feasibility issues. To address this, we modify our approach by adding a Bregman divergence constraint to control the trajectory of the parameter updates, which eliminates the feasibility problems. As timely operation is the key nowadays, we prove that the optimal parameters can be learned with convergence guarantees via leveraging the statistical and physical properties of voltage data. We evaluate our approach using many representative distribution grids and real load profiles with 17 outage configurations. The results show that we can detect and localize the outage in a timely manner with only voltage magnitudes and without assuming a prior knowledge of outage patterns.
    Effective Latent Differential Equation Models via Attention and Multiple Shooting. (arXiv:2307.05735v3 [cs.LG] UPDATED)
    Scientific Machine Learning (SciML) is a burgeoning field that synergistically combines domain-aware and interpretable models with agnostic machine learning techniques. In this work, we introduce GOKU-UI, an evolution of the SciML generative model GOKU-nets. GOKU-UI not only broadens the original model's spectrum to incorporate other classes of differential equations, such as Stochastic Differential Equations (SDEs), but also integrates attention mechanisms and a novel multiple shooting training strategy in the latent space. These modifications have led to a significant increase in its performance in both reconstruction and forecast tasks, as demonstrated by our evaluation of simulated and empirical data. Specifically, GOKU-UI outperformed all baseline models on synthetic datasets even with a training set 16-fold smaller, underscoring its remarkable data efficiency. Furthermore, when applied to empirical human brain data, while incorporating stochastic Stuart-Landau oscillators into its dynamical core, our proposed enhancements markedly increased the model's effectiveness in capturing complex brain dynamics. This augmented version not only surpassed all baseline methods in the reconstruction task, but also demonstrated lower prediction error of future brain activity up to 15 seconds ahead. By training GOKU-UI on resting state fMRI data, we encoded whole-brain dynamics into a latent representation, learning a low-dimensional dynamical system model that could offer insights into brain functionality and open avenues for practical applications such as the classification of mental states or psychiatric conditions. Ultimately, our research provides further impetus for the field of Scientific Machine Learning, showcasing the potential for advancements when established scientific insights are interwoven with modern machine learning.
    A Strong and Simple Deep Learning Baseline for BCI MI Decoding. (arXiv:2309.07159v1 [eess.SP])
    We propose EEG-SimpleConv, a straightforward 1D convolutional neural network for Motor Imagery decoding in BCI. Our main motivation is to propose a very simple baseline to compare to, using only very standard ingredients from the literature. We evaluate its performance on four EEG Motor Imagery datasets, including simulated online setups, and compare it to recent Deep Learning and Machine Learning approaches. EEG-SimpleConv is at least as good or far more efficient than other approaches, showing strong knowledge-transfer capabilities across subjects, at the cost of a low inference time. We advocate that using off-the-shelf ingredients rather than coming with ad-hoc solutions can significantly help the adoption of Deep Learning approaches for BCI. We make the code of the models and the experiments accessible.
    An Optimal Control Method to Compute the Most Likely Transition Path for Stochastic Dynamical Systems with Jumps. (arXiv:2203.16874v2 [math.NA] UPDATED)
    Many complex real world phenomena exhibit abrupt, intermittent or jumping behaviors, which are more suitable to be described by stochastic differential equations under non-Gaussian L\'evy noise. Among these complex phenomena, the most likely transition paths between metastable states are important since these rare events may have a high impact in certain scenarios. Based on the large deviation principle, the most likely transition path could be treated as the minimizer of the rate function upon paths that connect two points. One of the challenges to calculate the most likely transition path for stochastic dynamical systems under non-Gaussian L\'evy noise is that the associated rate function can not be explicitly expressed by paths. For this reason, we formulate an optimal control problem to obtain the optimal state as the most likely transition path. We then develop a neural network method to solve this issue. Several experiments are investigated for both Gaussian and non-Gaussian cases.
    Decoding visual brain representations from electroencephalography through Knowledge Distillation and latent diffusion models. (arXiv:2309.07149v1 [eess.SP])
    Decoding visual representations from human brain activity has emerged as a thriving research domain, particularly in the context of brain-computer interfaces. Our study presents an innovative method that employs to classify and reconstruct images from the ImageNet dataset using electroencephalography (EEG) data from subjects that had viewed the images themselves (i.e. "brain decoding"). We analyzed EEG recordings from 6 participants, each exposed to 50 images spanning 40 unique semantic categories. These EEG readings were converted into spectrograms, which were then used to train a convolutional neural network (CNN), integrated with a knowledge distillation procedure based on a pre-trained Contrastive Language-Image Pre-Training (CLIP)-based image classification teacher network. This strategy allowed our model to attain a top-5 accuracy of 80%, significantly outperforming a standard CNN and various RNN-based benchmarks. Additionally, we incorporated an image reconstruction mechanism based on pre-trained latent diffusion models, which allowed us to generate an estimate of the images which had elicited EEG activity. Therefore, our architecture not only decodes images from neural activity but also offers a credible image reconstruction from EEG only, paving the way for e.g. swift, individualized feedback experiments. Our research represents a significant step forward in connecting neural signals with visual cognition.
    Naturalistic Robot Arm Trajectory Generation via Representation Learning. (arXiv:2309.07550v1 [cs.RO])
    The integration of manipulator robots in household environments suggests a need for more predictable and human-like robot motion. This holds especially true for wheelchair-mounted assistive robots that can support the independence of people with paralysis. One method of generating naturalistic motion trajectories is via the imitation of human demonstrators. This paper explores a self-supervised imitation learning method using an autoregressive spatio-temporal graph neural network for an assistive drinking task. We address learning from diverse human motion trajectory data that were captured via wearable IMU sensors on a human arm as the action-free task demonstrations. Observed arm motion data from several participants is used to generate natural and functional drinking motion trajectories for a UR5e robot arm.
    Beta quantile regression for robust estimation of uncertainty in the presence of outliers. (arXiv:2309.07374v1 [cs.LG])
    Quantile Regression (QR) can be used to estimate aleatoric uncertainty in deep neural networks and can generate prediction intervals. Quantifying uncertainty is particularly important in critical applications such as clinical diagnosis, where a realistic assessment of uncertainty is essential in determining disease status and planning the appropriate treatment. The most common application of quantile regression models is in cases where the parametric likelihood cannot be specified. Although quantile regression is quite robust to outlier response observations, it can be sensitive to outlier covariate observations (features). Outlier features can compromise the performance of deep learning regression problems such as style translation, image reconstruction, and deep anomaly detection, potentially leading to misleading conclusions. To address this problem, we propose a robust solution for quantile regression that incorporates concepts from robust divergence. We compare the performance of our proposed method with (i) least trimmed quantile regression and (ii) robust regression based on the regularization of case-specific parameters in a simple real dataset in the presence of outlier. These methods have not been applied in a deep learning framework. We also demonstrate the applicability of the proposed method by applying it to a medical imaging translation task using diffusion models.
    Overview of Human Activity Recognition Using Sensor Data. (arXiv:2309.07170v1 [eess.SP])
    Human activity recognition (HAR) is an essential research field that has been used in different applications including home and workplace automation, security and surveillance as well as healthcare. Starting from conventional machine learning methods to the recently developing deep learning techniques and the Internet of things, significant contributions have been shown in the HAR area in the last decade. Even though several review and survey studies have been published, there is a lack of sensor-based HAR overview studies focusing on summarising the usage of wearable sensors and smart home sensors data as well as applications of HAR and deep learning techniques. Hence, we overview sensor-based HAR, discuss several important applications that rely on HAR, and highlight the most common machine learning methods that have been used for HAR. Finally, several challenges of HAR are explored that should be addressed to further improve the robustness of HAR.
    Tackling the dimensions in imaging genetics with CLUB-PLS. (arXiv:2309.07352v1 [q-bio.GN])
    A major challenge in imaging genetics and similar fields is to link high-dimensional data in one domain, e.g., genetic data, to high dimensional data in a second domain, e.g., brain imaging data. The standard approach in the area are mass univariate analyses across genetic factors and imaging phenotypes. That entails executing one genome-wide association study (GWAS) for each pre-defined imaging measure. Although this approach has been tremendously successful, one shortcoming is that phenotypes must be pre-defined. Consequently, effects that are not confined to pre-selected regions of interest or that reflect larger brain-wide patterns can easily be missed. In this work we introduce a Partial Least Squares (PLS)-based framework, which we term Cluster-Bootstrap PLS (CLUB-PLS), that can work with large input dimensions in both domains as well as with large sample sizes. One key factor of the framework is to use cluster bootstrap to provide robust statistics for single input features in both domains. We applied CLUB-PLS to investigating the genetic basis of surface area and cortical thickness in a sample of 33,000 subjects from the UK Biobank. We found 107 genome-wide significant locus-phenotype pairs that are linked to 386 different genes. We found that a vast majority of these loci could be technically validated at a high rate: using classic GWAS or Genome-Wide Inferred Statistics (GWIS) we found that 85 locus-phenotype pairs exceeded the genome-wide suggestive (P<1e-05) threshold.
    Sleep Stage Classification Using a Pre-trained Deep Learning Model. (arXiv:2309.07182v1 [eess.SP])
    One of the common human diseases is sleep disorders. The classification of sleep stages plays a fundamental role in diagnosing sleep disorders, monitoring treatment effectiveness, and understanding the relationship between sleep stages and various health conditions. A precise and efficient classification of these stages can significantly enhance our understanding of sleep-related phenomena and ultimately lead to improved health outcomes and disease treatment. Models others propose are often time-consuming and lack sufficient accuracy, especially in stage N1. The main objective of this research is to present a machine-learning model called "EEGMobile". This model utilizes pre-trained models and learns from electroencephalogram (EEG) spectrograms of brain signals. The model achieved an accuracy of 86.97% on a publicly available dataset named "Sleep-EDF20", outperforming other models proposed by different researchers. Moreover, it recorded an accuracy of 56.4% in stage N1, which is better than other models. These findings demonstrate that this model has the potential to achieve better results for the treatment of this disease.
    Benchmarking machine learning models for quantum state classification. (arXiv:2309.07679v1 [quant-ph])
    Quantum computing is a growing field where the information is processed by two-levels quantum states known as qubits. Current physical realizations of qubits require a careful calibration, composed by different experiments, due to noise and decoherence phenomena. Among the different characterization experiments, a crucial step is to develop a model to classify the measured state by discriminating the ground state from the excited state. In this proceedings we benchmark multiple classification techniques applied to real quantum devices.
    Design of Recognition and Evaluation System for Table Tennis Players' Motor Skills Based on Artificial Intelligence. (arXiv:2309.07141v1 [eess.SP])
    With the rapid development of electronic science and technology, the research on wearable devices is constantly updated, but for now, it is not comprehensive for wearable devices to recognize and analyze the movement of specific sports. Based on this, this paper improves wearable devices of table tennis sport, and realizes the pattern recognition and evaluation of table tennis players' motor skills through artificial intelligence. Firstly, a device is designed to collect the movement information of table tennis players and the actual movement data is processed. Secondly, a sliding window is made to divide the collected motion data into a characteristic database of six table tennis benchmark movements. Thirdly, motion features were constructed based on feature engineering, and motor skills were identified for different models after dimensionality reduction. Finally, the hierarchical evaluation system of motor skills is established with the loss functions of different evaluation indexes. The results show that in the recognition of table tennis players' motor skills, the feature-based BP neural network proposed in this paper has higher recognition accuracy and stronger generalization ability than the traditional convolutional neural network.
    Finding Influencers in Complex Networks: An Effective Deep Reinforcement Learning Approach. (arXiv:2309.07153v1 [cs.SI])
    Maximizing influences in complex networks is a practically important but computationally challenging task for social network analysis, due to its NP- hard nature. Most current approximation or heuristic methods either require tremendous human design efforts or achieve unsatisfying balances between effectiveness and efficiency. Recent machine learning attempts only focus on speed but lack performance enhancement. In this paper, different from previous attempts, we propose an effective deep reinforcement learning model that achieves superior performances over traditional best influence maximization algorithms. Specifically, we design an end-to-end learning framework that combines graph neural network as the encoder and reinforcement learning as the decoder, named DREIM. Trough extensive training on small synthetic graphs, DREIM outperforms the state-of-the-art baseline methods on very large synthetic and real-world networks on solution quality, and we also empirically show its linear scalability with regard to the network size, which demonstrates its superiority in solving this problem.
    A Deep Dive into Sleep: Single-Channel EEG-Based Sleep Stage Classification with Model Interpretability. (arXiv:2309.07156v1 [eess.SP])
    Sleep, a fundamental physiological process, occupies a significant portion of our lives. Accurate classification of sleep stages serves as a crucial tool for evaluating sleep quality and identifying probable sleep disorders. This work introduces a novel methodology that utilises a SE-Resnet-Bi-LSTM architecture to classify sleep into five separate stages. The classification process is based on the analysis of single-channel electroencephalograms (EEGs). The framework that has been suggested consists of two fundamental elements: a feature extractor that utilises SE-ResNet, and a temporal context encoder that use stacks of Bi-LSTM units.The effectiveness of our approach is substantiated by thorough assessments conducted on three different datasets, namely SLeepEDF-20, SleepEDF-78, and SHHS. Significantly, our methodology attains notable levels of accuracy, specifically 87.5\%, 83.9\%, and 87.8\%, along with macro-F1 scores of 82.5, 78.9, and 81.9 for the corresponding datasets. Notably, we introduce the utilization of 1D-GradCAM visualization to shed light on the decision-making process of our model in the realm of sleep stage classification. This visualization method not only provides valuable insights into the model's classification rationale but also aligns its outcomes with the annotations made by sleep experts. One notable feature of our research is the integration of an expedited training approach, which effectively preserves the model's resilience in terms of performance. The experimental evaluations conducted provide a comprehensive evaluation of the effectiveness of our proposed model in comparison to existing approaches, highlighting its potential for practical applications.
    Direct Text to Speech Translation System using Acoustic Units. (arXiv:2309.07478v1 [cs.CL])
    This paper proposes a direct text to speech translation system using discrete acoustic units. This framework employs text in different source languages as input to generate speech in the target language without the need for text transcriptions in this language. Motivated by the success of acoustic units in previous works for direct speech to speech translation systems, we use the same pipeline to extract the acoustic units using a speech encoder combined with a clustering algorithm. Once units are obtained, an encoder-decoder architecture is trained to predict them. Then a vocoder generates speech from units. Our approach for direct text to speech translation was tested on the new CVSS corpus with two different text mBART models employed as initialisation. The systems presented report competitive performance for most of the language pairs evaluated. Besides, results show a remarkable improvement when initialising our proposed architecture with a model pre-trained with more languages.
    Generating Parametric BRDFs from Natural Language Descriptions. (arXiv:2306.15679v2 [cs.GR] UPDATED)
    Artistic authoring of 3D environments is a laborious enterprise that also requires skilled content creators. There have been impressive improvements in using machine learning to address different aspects of generating 3D content, such as generating meshes, arranging geometry, synthesizing textures, etc. In this paper we develop a model to generate Bidirectional Reflectance Distribution Functions (BRDFs) from descriptive textual prompts. BRDFs are four dimensional probability distributions that characterize the interaction of light with surface materials. They are either represented parametrically, or by tabulating the probability density associated with every pair of incident and outgoing angles. The former lends itself to artistic editing while the latter is used when measuring the appearance of real materials. Numerous works have focused on hypothesizing BRDF models from images of materials. We learn a mapping from textual descriptions of materials to parametric BRDFs. Our model is first trained using a semi-supervised approach before being tuned via an unsupervised scheme. Although our model is general, in this paper we specifically generate parameters for MDL materials, conditioned on natural language descriptions, within NVIDIA's Omniverse platform. This enables use cases such as real-time text prompts to change materials of objects in 3D environments such as "dull plastic" or "shiny iron". Since the output of our model is a parametric BRDF, rather than an image of the material, it may be used to render materials using any shape under arbitrarily specified viewing and lighting conditions.
    Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning. (arXiv:2202.10629v3 [cs.LG] UPDATED)
    In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models and can even learn general task-agnostic representations for efficient finetuning to downstream tasks. However, deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. This paper provides an overview of model reprogramming to bridge this gap. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing and reusing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning, where the source and target domains can be vastly different. In many applications, model reprogramming outperforms transfer learning and training from scratch. This paper elucidates the methodology of model reprogramming, summarizes existing use cases, provides a theoretical explanation of the success of model reprogramming, and concludes with a discussion on open-ended research questions and opportunities. A list of model reprogramming studies is actively maintained and updated at https://github.com/IBM/model-reprogramming.
    A Robust SINDy Approach by Combining Neural Networks and an Integral Form. (arXiv:2309.07193v1 [math.DS])
    The discovery of governing equations from data has been an active field of research for decades. One widely used methodology for this purpose is sparse regression for nonlinear dynamics, known as SINDy. Despite several attempts, noisy and scarce data still pose a severe challenge to the success of the SINDy approach. In this work, we discuss a robust method to discover nonlinear governing equations from noisy and scarce data. To do this, we make use of neural networks to learn an implicit representation based on measurement data so that not only it produces the output in the vicinity of the measurements but also the time-evolution of output can be described by a dynamical system. Additionally, we learn such a dynamic system in the spirit of the SINDy framework. Leveraging the implicit representation using neural networks, we obtain the derivative information -- required for SINDy -- using an automatic differentiation tool. To enhance the robustness of our methodology, we further incorporate an integral condition on the output of the implicit networks. Furthermore, we extend our methodology to handle data collected from multiple initial conditions. We demonstrate the efficiency of the proposed methodology to discover governing equations under noisy and scarce data regimes by means of several examples and compare its performance with existing methods.
    SpikeCP: Delay-Adaptive Reliable Spiking Neural Networks via Conformal Prediction. (arXiv:2305.11322v3 [cs.NE] UPDATED)
    Spiking neural networks (SNNs) process time-series data via internal event-driven neural dynamics whose energy consumption depends on the number of spikes exchanged between neurons over the course of the input presentation. In typical implementations of an SNN classifier, decisions are produced after the entire input sequence has been processed, resulting in latency and energy consumption levels that are fairly uniform across inputs. Recently introduced delay-adaptive SNNs tailor the inference latency -- and, with it, the energy consumption -- to the difficulty of each example, by producing an early decision when the SNN model is sufficiently ``confident''. In this paper, we start by observing that, as an SNN processes input samples, its classification decisions tend to be first under-confident and then over-confident with respect to the decision's ground-truth, unknown, test accuracy. This makes it difficult to determine a stopping time that ensures a desired level of accuracy. To address this problem, we introduce a novel delay-adaptive SNN-based inference methodology that, wrapping around any pre-trained SNN classifier, provides guaranteed reliability for the decisions produced at input-dependent stopping times. The approach entails minimal added complexity as compared to the underlying SNN, requiring only thresholding and counting operations at run time, and it leverages tools from conformal prediction (CP).
    Tree of Uncertain Thoughts Reasoning for Large Language Models. (arXiv:2309.07694v1 [cs.CL])
    While the recently introduced Tree of Thoughts (ToT) has heralded advancements in allowing Large Language Models (LLMs) to reason through foresight and backtracking for global decision-making, it has overlooked the inherent local uncertainties in intermediate decision points or "thoughts". These local uncertainties, intrinsic to LLMs given their potential for diverse responses, remain a significant concern in the reasoning process. Addressing this pivotal gap, we introduce the Tree of Uncertain Thoughts (TouT) - a reasoning framework tailored for LLMs. Our TouT effectively leverages Monte Carlo Dropout to quantify uncertainty scores associated with LLMs' diverse local responses at these intermediate steps. By marrying this local uncertainty quantification with global search algorithms, TouT enhances the model's precision in response generation. We substantiate our approach with rigorous experiments on two demanding planning tasks: Game of 24 and Mini Crosswords. The empirical evidence underscores TouT's superiority over both ToT and chain-of-thought prompting methods.
    Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning. (arXiv:2309.07578v1 [cs.LG])
    We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to improve the agent's ability to generalize to out-of-distribution goals. To achieve this, we propose to learn a dynamics model and check if it is equivariant with respect to a fixed type of transformation, namely translations in the state space. We then use an entropy regularizer to increase the equivariant set and augment the dataset with the resulting transformed samples. Finally, we learn a new policy offline based on the augmented dataset, with an off-the-shelf offline RL algorithm. Our experimental results demonstrate that our approach can greatly improve the test performance of the policy on the considered environments.
    BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability. (arXiv:2105.00495v2 [cs.LG] UPDATED)
    Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. The lack of information on unknown potential attacks makes detecting adversarial examples challenging. Additionally, attackers do not need to follow the rules made by the defender. To address this problem, we take inspiration from the concept of Applicability Domain in cheminformatics. Cheminformatics models struggle to make accurate predictions because only a limited number of compounds are known and available for training. Applicability Domain defines a domain based on the known compounds and rejects any unknown compound that falls outside the domain. Similarly, adversarial examples start as harmless inputs, but can be manipulated to evade reliable classification by moving outside the domain of the classifier. We are the first to identify the similarity between Applicability Domain and adversarial detection. Instead of focusing on unknown attacks, we focus on what is known, the training data. We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally, and confirms that they are coherent with the model's output. This framework can be applied to any classification model and is not limited to specific attacks. We demonstrate these three stages work as one unit, effectively detecting various attacks, even for a white-box scenario.
    Joint Community Detection and Rotational Synchronization via Semidefinite Programming. (arXiv:2105.06031v2 [stat.ML] UPDATED)
    In the presence of heterogeneous data, where randomly rotated objects fall into multiple underlying categories, it is challenging to simultaneously classify them into clusters and synchronize them based on pairwise relations. This gives rise to the joint problem of community detection and synchronization. We propose a series of semidefinite relaxations, and prove their exact recovery when extending the celebrated stochastic block model to this new setting where both rotations and cluster identities are to be determined. Numerical experiments demonstrate the efficacy of our proposed algorithms and confirm our theoretical result which indicates a sharp phase transition for exact recovery.
    A Latent Space Theory for Emergent Abilities in Large Language Models. (arXiv:2304.09960v3 [cs.CL] UPDATED)
    Languages are not created randomly but rather to communicate information. There is a strong association between languages and their underlying meanings, resulting in a sparse joint distribution that is heavily peaked according to their correlations. Moreover, these peak values happen to match with the marginal distribution of languages due to the sparsity. With the advent of LLMs trained on big data and large models, we can now precisely assess the marginal distribution of languages, providing a convenient means of exploring the sparse structures in the joint distribution for effective inferences. In this paper, we categorize languages as either unambiguous or {\epsilon}-ambiguous and present quantitative results to demonstrate that the emergent abilities of LLMs, such as language understanding, in-context learning, chain-of-thought prompting, and effective instruction fine-tuning, can all be attributed to Bayesian inference on the sparse joint distribution of languages.
    Federated Dataset Dictionary Learning for Multi-Source Domain Adaptation. (arXiv:2309.07670v1 [cs.LG])
    In this article, we propose an approach for federated domain adaptation, a setting where distributional shift exists among clients and some have unlabeled data. The proposed framework, FedDaDiL, tackles the resulting challenge through dictionary learning of empirical distributions. In our setting, clients' distributions represent particular domains, and FedDaDiL collectively trains a federated dictionary of empirical distributions. In particular, we build upon the Dataset Dictionary Learning framework by designing collaborative communication protocols and aggregation operations. The chosen protocols keep clients' data private, thus enhancing overall privacy compared to its centralized counterpart. We empirically demonstrate that our approach successfully generates labeled data on the target domain with extensive experiments on (i) Caltech-Office, (ii) TEP, and (iii) CWRU benchmarks. Furthermore, we compare our method to its centralized counterpart and other benchmarks in federated domain adaptation.
    Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks. (arXiv:2302.07260v5 [cs.LG] UPDATED)
    Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained multi-fidelity optimization task involving shape optimization of rotor blades in turbo-machinery.
    On the complexity of finding a local minimizer of a quadratic function over a polytope. (arXiv:2008.05558v5 [math.OC] UPDATED)
    We show that unless P=NP, there cannot be a polynomial-time algorithm that finds a point within Euclidean distance $c^n$ (for any constant $c \ge 0$) of a local minimizer of an $n$-variate quadratic function over a polytope. This result (even with $c=0$) answers a question of Pardalos and Vavasis that appeared in 1992 on a list of seven open problems in complexity theory for numerical optimization. Our proof technique also implies that the problem of deciding whether a quadratic function has a local minimizer over an (unbounded) polyhedron, and that of deciding if a quartic polynomial has a local minimizer are NP-hard.
    Physics-constrained robust learning of open-form PDEs from limited and noisy data. (arXiv:2309.07672v1 [cs.LG])
    Unveiling the underlying governing equations of nonlinear dynamic systems remains a significant challenge, especially when encountering noisy observations and no prior knowledge available. This study proposes R-DISCOVER, a framework designed to robustly uncover open-form partial differential equations (PDEs) from limited and noisy data. The framework operates through two alternating update processes: discovering and embedding. The discovering phase employs symbolic representation and a reinforcement learning (RL)-guided hybrid PDE generator to efficiently produce diverse open-form PDEs with tree structures. A neural network-based predictive model fits the system response and serves as the reward evaluator for the generated PDEs. PDEs with superior fits are utilized to iteratively optimize the generator via the RL method and the best-performing PDE is selected by a parameter-free stability metric. The embedding phase integrates the initially identified PDE from the discovering process as a physical constraint into the predictive model for robust training. The traversal of PDE trees automates the construction of the computational graph and the embedding process without human intervention. Numerical experiments demonstrate our framework's capability to uncover governing equations from nonlinear dynamic systems with limited and highly noisy data and outperform other physics-informed neural network-based discovery methods. This work opens new potential for exploring real-world systems with limited understanding.
    TrojViT: Trojan Insertion in Vision Transformers. (arXiv:2208.13049v4 [cs.LG] UPDATED)
    Vision Transformers (ViTs) have demonstrated the state-of-the-art performance in various vision-related tasks. The success of ViTs motivates adversaries to perform backdoor attacks on ViTs. Although the vulnerability of traditional CNNs to backdoor attacks is well-known, backdoor attacks on ViTs are seldom-studied. Compared to CNNs capturing pixel-wise local features by convolutions, ViTs extract global context information through patches and attentions. Na\"ively transplanting CNN-specific backdoor attacks to ViTs yields only a low clean data accuracy and a low attack success rate. In this paper, we propose a stealth and practical ViT-specific backdoor attack $TrojViT$. Rather than an area-wise trigger used by CNN-specific backdoor attacks, TrojViT generates a patch-wise trigger designed to build a Trojan composed of some vulnerable bits on the parameters of a ViT stored in DRAM memory through patch salience ranking and attention-target loss. TrojViT further uses minimum-tuned parameter update to reduce the bit number of the Trojan. Once the attacker inserts the Trojan into the ViT model by flipping the vulnerable bits, the ViT model still produces normal inference accuracy with benign inputs. But when the attacker embeds a trigger into an input, the ViT model is forced to classify the input to a predefined target class. We show that flipping only few vulnerable bits identified by TrojViT on a ViT model using the well-known RowHammer can transform the model into a backdoored one. We perform extensive experiments of multiple datasets on various ViT models. TrojViT can classify $99.64\%$ of test images to a target class by flipping $345$ bits on a ViT for ImageNet.Our codes are available at https://github.com/mxzheng/TrojViT
    Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A Hybrid Transfer Learning Approach. (arXiv:2309.07265v1 [cs.NI])
    The open radio access network (O-RAN) architecture supports intelligent network control algorithms as one of its core capabilities. Data-driven applications incorporate such algorithms to optimize radio access network (RAN) functions via RAN intelligent controllers (RICs). Deep reinforcement learning (DRL) algorithms are among the main approaches adopted in the O-RAN literature to solve dynamic radio resource management problems. However, despite the benefits introduced by the O-RAN RICs, the practical adoption of DRL algorithms in real network deployments falls behind. This is primarily due to the slow convergence and unstable performance exhibited by DRL agents upon deployment and when facing previously unseen network conditions. In this paper, we address these challenges by proposing transfer learning (TL) as a core component of the training and deployment workflows for the DRL-based closed-loop control of O-RAN functionalities. To this end, we propose and design a hybrid TL-aided approach that leverages the advantages of both policy reuse and distillation TL methods to provide safe and accelerated convergence in DRL-based O-RAN slicing. We conduct a thorough experiment that accommodates multiple services, including real VR gaming traffic to reflect practical scenarios of O-RAN slicing. We also propose and implement policy reuse and distillation-aided DRL and non-TL-aided DRL as three separate baselines. The proposed hybrid approach shows at least: 7.7% and 20.7% improvements in the average initial reward value and the percentage of converged scenarios, and a 64.6% decrease in reward variance while maintaining fast convergence and enhancing the generalizability compared with the baselines.
    Self-Supervised Blind Source Separation via Multi-Encoder Autoencoders. (arXiv:2309.07138v1 [eess.SP])
    The task of blind source separation (BSS) involves separating sources from a mixture without prior knowledge of the sources or the mixing system. This is a challenging problem that often requires making restrictive assumptions about both the mixing system and the sources. In this paper, we propose a novel method for addressing BSS of non-linear mixtures by leveraging the natural feature subspace specialization ability of multi-encoder autoencoders with fully self-supervised learning without strong priors. During the training phase, our method unmixes the input into the separate encoding spaces of the multi-encoder network and then remixes these representations within the decoder for a reconstruction of the input. Then to perform source inference, we introduce a novel encoding masking technique whereby masking out all but one of the encodings enables the decoder to estimate a source signal. To this end, we also introduce a so-called pathway separation loss that encourages sparsity between the unmixed encoding spaces throughout the decoder's layers and a so-called zero reconstruction loss on the decoder for coherent source estimations. In order to carefully evaluate our method, we conduct experiments on a toy dataset and with real-world biosignal recordings from a polysomnography sleep study for extracting respiration.
    Dataset Size Dependence of Rate-Distortion Curve and Threshold of Posterior Collapse in Linear VAE. (arXiv:2309.07663v1 [stat.ML])
    In the Variational Autoencoder (VAE), the variational posterior often aligns closely with the prior, which is known as posterior collapse and hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter beta has been introduced in the VAE. This paper presents a closed-form expression to assess the relationship between the beta in VAE, the dataset size, the posterior collapse, and the rate-distortion curve by analyzing a minimal VAE in a high-dimensional limit. These results clarify that a long plateau in the generalization error emerges with a relatively larger beta. As the beta increases, the length of the plateau extends and then becomes infinite beyond a certain beta threshold. This implies that the choice of beta, unlike the usual regularization parameters, can induce posterior collapse regardless of the dataset size. Thus, beta is a risky parameter that requires careful tuning. Furthermore, considering the dataset-size dependence on the rate-distortion curve, a relatively large dataset is required to obtain a rate-distortion curve with high rates. Extensive numerical experiments support our analysis.
    Efficient quantum recurrent reinforcement learning via quantum reservoir computing. (arXiv:2309.07339v1 [quant-ph])
    Quantum reinforcement learning (QRL) has emerged as a framework to solve sequential decision-making tasks, showcasing empirical quantum advantages. A notable development is through quantum recurrent neural networks (QRNNs) for memory-intensive tasks such as partially observable environments. However, QRL models incorporating QRNN encounter challenges such as inefficient training of QRL with QRNN, given that the computation of gradients in QRNN is both computationally expensive and time-consuming. This work presents a novel approach to address this challenge by constructing QRL agents utilizing QRNN-based reservoirs, specifically employing quantum long short-term memory (QLSTM). QLSTM parameters are randomly initialized and fixed without training. The model is trained using the asynchronous advantage actor-aritic (A3C) algorithm. Through numerical simulations, we validate the efficacy of our QLSTM-Reservoir RL framework. Its performance is assessed on standard benchmarks, demonstrating comparable results to a fully trained QLSTM RL model with identical architecture and training settings.
    Speeding up Learning Quantum States through Group Equivariant Convolutional Quantum Ans\"atze. (arXiv:2112.07611v3 [quant-ph] UPDATED)
    We develop a theoretical framework for $S_n$-equivariant convolutional quantum circuits with SU$(d)$-symmetry, building on and significantly generalizing Jordan's Permutational Quantum Computing (PQC) formalism based on Schur-Weyl duality connecting both SU$(d)$ and $S_n$ actions on qudits. In particular, we utilize the Okounkov-Vershik approach to prove Harrow's statement (Ph.D. Thesis 2005 p.160) on the equivalence between $\operatorname{SU}(d)$ and $S_n$ irrep bases and to establish the $S_n$-equivariant Convolutional Quantum Alternating Ans\"atze ($S_n$-CQA) using Young-Jucys-Murphy (YJM) elements. We prove that $S_n$-CQA is able to generate any unitary in any given $S_n$ irrep sector, which may serve as a universal model for a wide array of quantum machine learning problems with the presence of SU($d$) symmetry. Our method provides another way to prove the universality of Quantum Approximate Optimization Algorithm (QAOA) and verifies that 4-local SU($d$) symmetric unitaries are sufficient to build generic SU($d$) symmetric quantum circuits up to relative phase factors. We present numerical simulations to showcase the effectiveness of the ans\"atze to find the ground state energy of the $J_1$--$J_2$ antiferromagnetic Heisenberg model on the rectangular and Kagome lattices. Our work provides the first application of the celebrated Okounkov-Vershik's $S_n$ representation theory to quantum physics and machine learning, from which to propose quantum variational ans\"atze that strongly suggests to be classically intractable tailored towards a specific optimization problem.
    Proximal Bellman mappings for reinforcement learning and their application to robust adaptive filtering. (arXiv:2309.07548v1 [eess.SP])
    This paper aims at the algorithmic/theoretical core of reinforcement learning (RL) by introducing the novel class of proximal Bellman mappings. These mappings are defined in reproducing kernel Hilbert spaces (RKHSs), to benefit from the rich approximation properties and inner product of RKHSs, they are shown to belong to the powerful Hilbertian family of (firmly) nonexpansive mappings, regardless of the values of their discount factors, and possess ample degrees of design freedom to even reproduce attributes of the classical Bellman mappings and to pave the way for novel RL designs. An approximate policy-iteration scheme is built on the proposed class of mappings to solve the problem of selecting online, at every time instance, the "optimal" exponent $p$ in a $p$-norm loss to combat outliers in linear adaptive filtering, without training data and any knowledge on the statistical properties of the outliers. Numerical tests on synthetic data showcase the superior performance of the proposed framework over several non-RL and kernel-based RL schemes.
    Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains. (arXiv:2309.07344v1 [cs.LG])
    Accelerating the learning of Partial Differential Equations (PDEs) from experimental data will speed up the pace of scientific discovery. Previous randomized algorithms exploit sparsity in PDE updates for acceleration. However such methods are applicable to a limited class of decomposable PDEs, which have sparse features in the value domain. We propose Reel, which accelerates the learning of PDEs via random projection and has much broader applicability. Reel exploits the sparsity by decomposing dense updates into sparse ones in both the value and frequency domains. This decomposition enables efficient learning when the source of the updates consists of gradually changing terms across large areas (sparse in the frequency domain) in addition to a few rapid updates concentrated in a small set of "interfacial" regions (sparse in the value domain). Random projection is then applied to compress the sparse signals for learning. To expand the model applicability, Taylor series expansion is used in Reel to approximate the nonlinear PDE updates with polynomials in the decomposable form. Theoretically, we derive a constant factor approximation between the projected loss function and the original one with poly-logarithmic number of projected dimensions. Experimentally, we provide empirical evidence that our proposed Reel can lead to faster learning of PDE models (70-98% reduction in training time when the data is compressed to 1% of its original size) with comparable quality as the non-compressed models.
    Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?. (arXiv:2309.07452v1 [cs.LG])
    A rising trend in theoretical deep learning is to understand why deep learning works through Neural Tangent Kernel (NTK) [jgh18], a kernel method that is equivalent to using gradient descent to train a multi-layer infinitely-wide neural network. NTK is a major step forward in the theoretical deep learning because it allows researchers to use traditional mathematical tools to analyze properties of deep neural networks and to explain various neural network techniques from a theoretical view. A natural extension of NTK on graph learning is \textit{Graph Neural Tangent Kernel (GNTK)}, and researchers have already provide GNTK formulation for graph-level regression and show empirically that this kernel method can achieve similar accuracy as GNNs on various bioinformatics datasets [dhs+19]. The remaining question now is whether solving GNTK regression is equivalent to training an infinite-wide multi-layer GNN using gradient descent. In this paper, we provide three new theoretical results. First, we formally prove this equivalence for graph-level regression. Second, we present the first GNTK formulation for node-level regression. Finally, we prove the equivalence for node-level regression.
    Rates of Convergence in Certain Native Spaces of Approximations used in Reinforcement Learning. (arXiv:2309.07383v1 [eess.SY])
    This paper studies convergence rates for some value function approximations that arise in a collection of reproducing kernel Hilbert spaces (RKHS) $H(\Omega)$. By casting an optimal control problem in a specific class of native spaces, strong rates of convergence are derived for the operator equation that enables offline approximations that appear in policy iteration. Explicit upper bounds on error in value function approximations are derived in terms of power function $\Pwr_{H,N}$ for the space of finite dimensional approximants $H_N$ in the native space $H(\Omega)$. These bounds are geometric in nature and refine some well-known, now classical results concerning convergence of approximations of value functions.
    GBE-MLZSL: A Group Bi-Enhancement Framework for Multi-Label Zero-Shot Learning. (arXiv:2309.00923v2 [cs.CV] UPDATED)
    This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL), wherein, the model is trained to recognize multiple unseen classes within a sample (e.g., an image) based on seen classes and auxiliary knowledge, e.g., semantic information. Existing methods usually resort to analyzing the relationship of various seen classes residing in a sample from the dimension of spatial or semantic characteristics, and transfer the learned model to unseen ones. But they ignore the effective integration of local and global features. That is, in the process of inferring unseen classes, global features represent the principal direction of the image in the feature space, while local features should maintain uniqueness within a certain range. This integrated neglect will make the model lose its grasp of the main components of the image. Relying only on the local existence of seen classes during the inference stage introduces unavoidable bias. In this paper, we propose a novel and effective group bi-enhancement framework for MLZSL, dubbed GBE-MLZSL, to fully make use of such properties and enable a more accurate and robust visual-semantic projection. Specifically, we split the feature maps into several feature groups, of which each feature group can be trained independently with the Local Information Distinguishing Module (LID) to ensure uniqueness. Meanwhile, a Global Enhancement Module (GEM) is designed to preserve the principal direction. Besides, a static graph structure is designed to construct the correlation of local features. Experiments on large-scale MLZSL benchmark datasets NUS-WIDE and Open-Images-v4 demonstrate that the proposed GBE-MLZSL outperforms other state-of-the-art methods with large margins.
    ETP: Learning Transferable ECG Representations via ECG-Text Pre-training. (arXiv:2309.07145v1 [eess.SP])
    In the domain of cardiovascular healthcare, the Electrocardiogram (ECG) serves as a critical, non-invasive diagnostic tool. Although recent strides in self-supervised learning (SSL) have been promising for ECG representation learning, these techniques often require annotated samples and struggle with classes not present in the fine-tuning stages. To address these limitations, we introduce ECG-Text Pre-training (ETP), an innovative framework designed to learn cross-modal representations that link ECG signals with textual reports. For the first time, this framework leverages the zero-shot classification task in the ECG domain. ETP employs an ECG encoder along with a pre-trained language model to align ECG signals with their corresponding textual reports. The proposed framework excels in both linear evaluation and zero-shot classification tasks, as demonstrated on the PTB-XL and CPSC2018 datasets, showcasing its ability for robust and generalizable cross-modal ECG feature learning.
    Bringing PDEs to JAX with forward and reverse modes automatic differentiation. (arXiv:2309.07137v1 [cs.MS])
    Partial differential equations (PDEs) are used to describe a variety of physical phenomena. Often these equations do not have analytical solutions and numerical approximations are used instead. One of the common methods to solve PDEs is the finite element method. Computing derivative information of the solution with respect to the input parameters is important in many tasks in scientific computing. We extend JAX automatic differentiation library with an interface to Firedrake finite element library. High-level symbolic representation of PDEs allows bypassing differentiating through low-level possibly many iterations of the underlying nonlinear solvers. Differentiating through Firedrake solvers is done using tangent-linear and adjoint equations. This enables the efficient composition of finite element solvers with arbitrary differentiable programs. The code is available at github.com/IvanYashchuk/jax-firedrake.
    Gibbs-Duhem-Informed Neural Networks for Binary Activity Coefficient Prediction. (arXiv:2306.07937v2 [physics.chem-ph] UPDATED)
    We propose Gibbs-Duhem-informed neural networks for the prediction of binary activity coefficients at varying compositions. That is, we include the Gibbs-Duhem equation explicitly in the loss function for training neural networks, which is straightforward in standard machine learning (ML) frameworks enabling automatic differentiation. In contrast to recent hybrid ML approaches, our approach does not rely on embedding a specific thermodynamic model inside the neural network and corresponding prediction limitations. Rather, Gibbs-Duhem consistency serves as regularization, with the flexibility of ML models being preserved. Our results show increased thermodynamic consistency and generalization capabilities for activity coefficient predictions by Gibbs-Duhem-informed graph neural networks and matrix completion methods. We also find that the model architecture, particularly the activation function, can have a strong influence on the prediction quality. The approach can be easily extended to account for other thermodynamic consistency conditions.
    MELAGE: A purely python based Neuroimaging software (Neonatal). (arXiv:2309.07175v1 [eess.IV])
    MELAGE, a pioneering Python-based neuroimaging software, emerges as a versatile tool for the visualization, processing, and analysis of medical images. Initially conceived to address the unique challenges of processing 3D ultrasound and MRI brain images during the neonatal period, MELAGE exhibits remarkable adaptability, extending its utility to the domain of adult human brain imaging. At its core, MELAGE features a semi-automatic brain extraction tool empowered by a deep learning module, ensuring precise and efficient brain structure extraction from MRI and 3D Ultrasound data. Moreover, MELAGE offers a comprehensive suite of features, encompassing dynamic 3D visualization, accurate measurements, and interactive image segmentation. This transformative software holds immense promise for researchers and clinicians, offering streamlined image analysis, seamless integration with deep learning algorithms, and broad applicability in the realm of medical imaging.
    EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. (arXiv:2305.15021v2 [cs.RO] UPDATED)
    Embodied AI is a crucial frontier in robotics, capable of planning and executing action sequences for robots to accomplish long-horizon tasks in physical environments. In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities. To achieve this, we have made the following efforts: (i) We craft a large-scale embodied planning dataset, termed EgoCOT. The dataset consists of carefully selected videos from the Ego4D dataset, along with corresponding high-quality language instructions. Specifically, we generate a sequence of sub-goals with the "Chain of Thoughts" mode for effective embodied planning. (ii) We introduce an efficient training approach to EmbodiedGPT for high-quality plan generation, by adapting a 7B large language model (LLM) to the EgoCOT dataset via prefix tuning. (iii) We introduce a paradigm for extracting task-related features from LLM-generated planning queries to form a closed loop between high-level planning and low-level control. Extensive experiments show the effectiveness of EmbodiedGPT on embodied tasks, including embodied planning, embodied control, visual captioning, and visual question answering. Notably, EmbodiedGPT significantly enhances the success rate of the embodied control task by extracting more effective features. It has achieved a remarkable 1.6 times increase in success rate on the Franka Kitchen benchmark and a 1.3 times increase on the Meta-World benchmark, compared to the BLIP-2 baseline fine-tuned with the Ego4D dataset.
    EpiDeNet: An Energy-Efficient Approach to Seizure Detection for Embedded Systems. (arXiv:2309.07135v1 [eess.SP])
    Epilepsy is a prevalent neurological disorder that affects millions of individuals globally, and continuous monitoring coupled with automated seizure detection appears as a necessity for effective patient treatment. To enable long-term care in daily-life conditions, comfortable and smart wearable devices with long battery life are required, which in turn set the demand for resource-constrained and energy-efficient computing solutions. In this context, the development of machine learning algorithms for seizure detection faces the challenge of heavily imbalanced datasets. This paper introduces EpiDeNet, a new lightweight seizure detection network, and Sensitivity-Specificity Weighted Cross-Entropy (SSWCE), a new loss function that incorporates sensitivity and specificity, to address the challenge of heavily unbalanced datasets. The proposed EpiDeNet-SSWCE approach demonstrates the successful detection of 91.16% and 92.00% seizure events on two different datasets (CHB-MIT and PEDESITE, respectively), with only four EEG channels. A three-window majority voting-based smoothing scheme combined with the SSWCE loss achieves 3x reduction of false positives to 1.18 FP/h. EpiDeNet is well suited for implementation on low-power embedded platforms, and we evaluate its performance on two ARM Cortex-based platforms (M4F/M7) and two parallel ultra-low power (PULP) systems (GAP8, GAP9). The most efficient implementation (GAP9) achieves an energy efficiency of 40 GMAC/s/W, with an energy consumption per inference of only 0.051 mJ at high performance (726.46 MMAC/s), outperforming the best ARM Cortex-based solutions by approximately 160x in energy efficiency. The EpiDeNet-SSWCE method demonstrates effective and accurate seizure detection performance on heavily imbalanced datasets, while being suited for implementation on energy-constrained platforms.
    Detecting Unknown Attacks in IoT Environments: An Open Set Classifier for Enhanced Network Intrusion Detection. (arXiv:2309.07461v1 [cs.CR])
    The widespread integration of Internet of Things (IoT) devices across all facets of life has ushered in an era of interconnectedness, creating new avenues for cybersecurity challenges and underscoring the need for robust intrusion detection systems. However, traditional security systems are designed with a closed-world perspective and often face challenges in dealing with the ever-evolving threat landscape, where new and unfamiliar attacks are constantly emerging. In this paper, we introduce a framework aimed at mitigating the open set recognition (OSR) problem in the realm of Network Intrusion Detection Systems (NIDS) tailored for IoT environments. Our framework capitalizes on image-based representations of packet-level data, extracting spatial and temporal patterns from network traffic. Additionally, we integrate stacking and sub-clustering techniques, enabling the identification of unknown attacks by effectively modeling the complex and diverse nature of benign behavior. The empirical results prominently underscore the framework's efficacy, boasting an impressive 88\% detection rate for previously unseen attacks when compared against existing approaches and recent advancements. Future work will perform extensive experimentation across various openness levels and attack scenarios, further strengthening the adaptability and performance of our proposed solution in safeguarding IoT environments.
    Choosing a Proxy Metric from Past Experiments. (arXiv:2309.07893v1 [stat.ME])
    In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric -- so they can be used to effectively guide decision-making in the near-term. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments. Our procedure first reduces the construction of an optimal proxy metric in a given experiment to a portfolio optimization problem which depends on the true latent treatment effects and noise level of experiment under consideration. We then denoise the observed treatment effects of the long-term metric and a set of proxies in a historical corpus of randomized experiments to extract estimates of the latent treatment effects for use in the optimization problem. One key insight derived from our approach is that the optimal proxy metric for a given experiment is not apriori fixed; rather it should depend on the sample size (or effective noise level) of the randomized experiment for which it is deployed. To instantiate and evaluate our framework, we employ our methodology in a large corpus of randomized experiments from an industrial recommendation system and construct proxy metrics that perform favorably relative to several baselines.
    Attention-based Dynamic Graph Convolutional Recurrent Neural Network for Traffic Flow Prediction in Highway Transportation. (arXiv:2309.07196v1 [cs.LG])
    As one of the important tools for spatial feature extraction, graph convolution has been applied in a wide range of fields such as traffic flow prediction. However, current popular works of graph convolution cannot guarantee spatio-temporal consistency in a long period. The ignorance of correlational dynamics, convolutional locality and temporal comprehensiveness would limit predictive accuracy. In this paper, a novel Attention-based Dynamic Graph Convolutional Recurrent Neural Network (ADGCRNN) is proposed to improve traffic flow prediction in highway transportation. Three temporal resolutions of data sequence are effectively integrated by self-attention to extract characteristics; multi-dynamic graphs and their weights are dynamically created to compliantly combine the varying characteristics; a dedicated gated kernel emphasizing highly relative nodes is introduced on these complete graphs to reduce overfitting for graph convolution operations. Experiments on two public datasets show our work better than state-of-the-art baselines, and case studies of a real Web system prove practical benefit in highway transportation.
    User Training with Error Augmentation for Electromyogram-based Gesture Classification. (arXiv:2309.07289v1 [cs.HC])
    We designed and tested a system for real-time control of a user interface by extracting surface electromyographic (sEMG) activity from eight electrodes in a wrist-band configuration. sEMG data were streamed into a machine-learning algorithm that classified hand gestures in real-time. After an initial model calibration, participants were presented with one of three types of feedback during a human-learning stage: veridical feedback, in which predicted probabilities from the gesture classification algorithm were displayed without alteration, modified feedback, in which we applied a hidden augmentation of error to these probabilities, and no feedback. User performance was then evaluated in a series of minigames, in which subjects were required to use eight gestures to manipulate their game avatar to complete a task. Experimental results indicated that, relative to baseline, the modified feedback condition led to significantly improved accuracy and improved gesture class separation. These findings suggest that real-time feedback in a gamified user interface with manipulation of feedback may enable intuitive, rapid, and accurate task acquisition for sEMG-based gesture recognition applications.
    Optimal and Fair Encouragement Policy Evaluation and Learning. (arXiv:2309.07176v1 [cs.LG])
    In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We consider fairness constraints such as demographic parity in treatment take-up, and other constraints, via constrained optimization. Our framework can be extended to handle algorithmic recommendations under an often-reasonable covariate-conditional exclusion restriction, using our robustness checks for lack of positivity in the recommendation. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. We illustrate the methods in two case studies based on data from randomized encouragement to enroll in insurance and from pretrial supervised release with electronic monitoring.
    Deep Spatiotemporal Clustering: A Temporal Clustering Approach for Multi-dimensional Climate Data. (arXiv:2304.14541v2 [cs.LG] UPDATED)
    Clustering high-dimensional spatiotemporal data using an unsupervised approach is a challenging problem for many data-driven applications. Existing state-of-the-art methods for unsupervised clustering use different similarity and distance functions but focus on either spatial or temporal features of the data. Concentrating on joint deep representation learning of spatial and temporal features, we propose Deep Spatiotemporal Clustering (DSC), a novel algorithm for the temporal clustering of high-dimensional spatiotemporal data using an unsupervised deep learning method. Inspired by the U-net architecture, DSC utilizes an autoencoder integrating CNN-RNN layers to learn latent representations of the spatiotemporal data. DSC also includes a unique layer for cluster assignment on latent representations that uses the Student's t-distribution. By optimizing the clustering loss and data reconstruction loss simultaneously, the algorithm gradually improves clustering assignments and the nonlinear mapping between low-dimensional latent feature space and high-dimensional original data space. A multivariate spatiotemporal climate dataset is used to evaluate the efficacy of the proposed method. Our extensive experiments show our approach outperforms both conventional and deep learning-based unsupervised clustering algorithms. Additionally, we compared the proposed model with its various variants (CNN encoder, CNN autoencoder, CNN-RNN encoder, CNN-RNN autoencoder, etc.) to get insight into using both the CNN and RNN layers in the autoencoder, and our proposed technique outperforms these variants in terms of clustering results.
    Short-term power load forecasting method based on CNN-SAEDN-Res. (arXiv:2309.07140v1 [eess.SP])
    In deep learning, the load data with non-temporal factors are difficult to process by sequence models. This problem results in insufficient precision of the prediction. Therefore, a short-term load forecasting method based on convolutional neural network (CNN), self-attention encoder-decoder network (SAEDN) and residual-refinement (Res) is proposed. In this method, feature extraction module is composed of a two-dimensional convolutional neural network, which is used to mine the local correlation between data and obtain high-dimensional data features. The initial load fore-casting module consists of a self-attention encoder-decoder network and a feedforward neural network (FFN). The module utilizes self-attention mechanisms to encode high-dimensional features. This operation can obtain the global correlation between data. Therefore, the model is able to retain important information based on the coupling relationship between the data in data mixed with non-time series factors. Then, self-attention decoding is per-formed and the feedforward neural network is used to regression initial load. This paper introduces the residual mechanism to build the load optimization module. The module generates residual load values to optimize the initial load. The simulation results show that the proposed load forecasting method has advantages in terms of prediction accuracy and prediction stability.  ( 2 min )
    TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering. (arXiv:2212.04953v2 [q-bio.GN] UPDATED)
    Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for many applications, the majority of reads do no match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation. To overcome this issue, we propose TargetCall, the first pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall's key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads; and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. TargetCall aims to filter out all off-target reads before basecalling. The highly-accurate but slow basecalling is performed only on the raw signals whose noisy reads are labeled as on-target. Our thorough experimental evaluations using both real and simulated data show that TargetCall 1) improves the end-to-end basecalling performance while maintaining high sensitivity in keeping on-target reads, 2) maintains high accuracy in downstream analysis, 3) precisely filters out up to 94.71% of off-target reads, and 4) achieves better performance, throughput, sensitivity, precision, and generality compared to prior works. We open-source TargetCall at https://github.com/CMU-SAFARI/TargetCall  ( 3 min )
    Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget. (arXiv:2304.10520v2 [cs.CV] UPDATED)
    Masked Image Modeling (MIM) methods, like Masked Autoencoders (MAE), efficiently learn a rich representation of the input. However, for adapting to downstream tasks, they require a sufficient amount of labeled data since their rich features code not only objects but also less relevant image background. In contrast, Instance Discrimination (ID) methods focus on objects. In this work, we study how to combine the efficiency and scalability of MIM with the ability of ID to perform downstream classification in the absence of large amounts of labeled data. To this end, we introduce Masked Autoencoder Contrastive Tuning (MAE-CT), a sequential approach that utilizes the implicit clustering of the Nearest Neighbor Contrastive Learning (NNCLR) objective to induce abstraction in the topmost layers of a pre-trained MAE. MAE-CT tunes the rich features such that they form semantic clusters of objects without using any labels. Notably, MAE-CT does not rely on hand-crafted augmentations and frequently achieves its best performances while using only minimal augmentations (crop & flip). Further, MAE-CT is compute efficient as it requires at most 10% overhead compared to MAE re-training. Applied to large and huge Vision Transformer (ViT) models, MAE-CT excels over previous self-supervised methods trained on ImageNet in linear probing, k-NN and low-shot classification accuracy as well as in unsupervised clustering accuracy. With ViT-H/16 MAE-CT achieves a new state-of-the-art in linear probing of 82.2%.  ( 3 min )
    All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks. (arXiv:2309.07250v1 [quant-ph])
    Variational algorithms require architectures that naturally constrain the optimisation space to run efficiently. In geometric quantum machine learning, one achieves this by encoding group structure into parameterised quantum circuits to include the symmetries of a problem as an inductive bias. However, constructing such circuits is challenging as a concrete guiding principle has yet to emerge. In this paper, we propose the use of spin networks, a form of directed tensor network invariant under a group transformation, to devise SU(2) equivariant quantum circuit ans\"atze -- circuits possessing spin rotation symmetry. By changing to the basis that block diagonalises SU(2) group action, these networks provide a natural building block for constructing parameterised equivariant quantum circuits. We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and generalised permutations, but more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results highlight that our equivariant circuits boost the performance of quantum variational algorithms, indicating broader applicability to other real-world problems.  ( 2 min )
    TempEE: Temporal-Spatial Parallel Transformer for Radar Echo Extrapolation Beyond Auto-Regression. (arXiv:2304.14131v2 [eess.SP] UPDATED)
    Meteorological radar reflectivity data (i.e. radar echo) significantly influences precipitation prediction. It can facilitate accurate and expeditious forecasting of short-term heavy rainfall bypassing the need for complex Numerical Weather Prediction (NWP) models. In comparison to conventional models, Deep Learning (DL)-based radar echo extrapolation algorithms exhibit higher effectiveness and efficiency. Nevertheless, the development of reliable and generalized echo extrapolation algorithm is impeded by three primary challenges: cumulative error spreading, imprecise representation of sparsely distributed echoes, and inaccurate description of non-stationary motion processes. To tackle these challenges, this paper proposes a novel radar echo extrapolation algorithm called Temporal-Spatial Parallel Transformer, referred to as TempEE. TempEE avoids using auto-regression and instead employs a one-step forward strategy to prevent cumulative error spreading during the extrapolation process. Additionally, we propose the incorporation of a Multi-level Temporal-Spatial Attention mechanism to improve the algorithm's capability of capturing both global and local information while emphasizing task-related regions, including sparse echo representations, in an efficient manner. Furthermore, the algorithm extracts spatio-temporal representations from continuous echo images using a parallel encoder to model the non-stationary motion process for echo extrapolation. The superiority of our TempEE has been demonstrated in the context of the classic radar echo extrapolation task, utilizing a real-world dataset. Extensive experiments have further validated the efficacy and indispensability of various components within TempEE.  ( 3 min )
    Survival Estimation for Missing not at Random Censoring Indicators based on Copula Models. (arXiv:2009.01726v2 [stat.ML] UPDATED)
    In the presence of right-censored data with covariates, the conditional Kaplan-Meier estimator (also known as the Beran estimator) consistently estimates the conditional survival function of the random follow-up for the event of interest. However, a necessary condition is the unambiguous knowledge of whether each individual is censored or not, which may be incomplete in practice. We therefore propose a study of the Beran estimator when the censoring indicators are generic random variables and discuss necessary conditions for the efficiency of the Beran estimator. From this, we provide a new estimator for the conditional survival function with missing not at random (MNAR) censoring indicators based on a conditional copula model for the missingness mechanism. In addition to the theoretical results, we illustrate how the estimators work for small samples through a simulation study and show their practical applicability by analyzing synthetic and real data.  ( 2 min )
    Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?. (arXiv:2309.07602v1 [cs.IR])
    Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.  ( 2 min )
    Solving Recurrence Relations using Machine Learning, with Application to Cost Analysis. (arXiv:2309.07259v1 [cs.PL])
    Automatic static cost analysis infers information about the resources used by programs without actually running them with concrete data, and presents such information as functions of input data sizes. Most of the analysis tools for logic programs (and other languages) are based on setting up recurrence relations representing (bounds on) the computational cost of predicates, and solving them to find closed-form functions that are equivalent to (or a bound on) them. Such recurrence solving is a bottleneck in current tools: many of the recurrences that arise during the analysis cannot be solved with current solvers, such as Computer Algebra Systems (CASs), so that specific methods for different classes of recurrences need to be developed. We address such a challenge by developing a novel, general approach for solving arbitrary, constrained recurrence relations, that uses machine-learning sparse regression techniques to guess a candidate closed-form function, and a combination of an SMT-solver and a CAS to check whether such function is actually a solution of the recurrence. We have implemented a prototype and evaluated it with recurrences generated by a cost analysis system (the one in CiaoPP). The experimental results are quite promising, showing that our approach can find closed-form solutions, in a reasonable time, for classes of recurrences that cannot be solved by such a system, nor by current CASs.  ( 3 min )
    Communication Efficient Private Federated Learning Using Dithering. (arXiv:2309.07809v1 [cs.LG])
    The task of preserving privacy while ensuring efficient communication is a fundamental challenge in federated learning. In this work, we tackle this challenge in the trusted aggregator model, and propose a solution that achieves both objectives simultaneously. We show that employing a quantization scheme based on subtractive dithering at the clients can effectively replicate the normal noise addition process at the aggregator. This implies that we can guarantee the same level of differential privacy against other clients while substantially reducing the amount of communication required, as opposed to transmitting full precision gradients and using central noise addition. We also experimentally demonstrate that the accuracy of our proposed approach matches that of the full precision gradient method.  ( 2 min )
    Hodge-Aware Contrastive Learning. (arXiv:2309.07364v1 [cs.LG])
    Simplicial complexes prove effective in modeling data with multiway dependencies, such as data defined along the edges of networks or within other higher-order structures. Their spectrum can be decomposed into three interpretable subspaces via the Hodge decomposition, resulting foundational in numerous applications. We leverage this decomposition to develop a contrastive self-supervised learning approach for processing simplicial data and generating embeddings that encapsulate specific spectral information.Specifically, we encode the pertinent data invariances through simplicial neural networks and devise augmentations that yield positive contrastive examples with suitable spectral properties for downstream tasks. Additionally, we reweight the significance of negative examples in the contrastive loss, considering the similarity of their Hodge components to the anchor. By encouraging a stronger separation among less similar instances, we obtain an embedding space that reflects the spectral properties of the data. The numerical results on two standard edge flow classification tasks show a superior performance even when compared to supervised learning techniques. Our findings underscore the importance of adopting a spectral perspective for contrastive learning with higher-order data.  ( 2 min )
    Variational Quantum Linear Solver enhanced Quantum Support Vector Machine. (arXiv:2309.07770v1 [quant-ph])
    Quantum Support Vector Machines (QSVM) play a vital role in using quantum resources for supervised machine learning tasks, such as classification. However, current methods are strongly limited in terms of scalability on Noisy Intermediate Scale Quantum (NISQ) devices. In this work, we propose a novel approach called the Variational Quantum Linear Solver (VQLS) enhanced QSVM. This is built upon our idea of utilizing the variational quantum linear solver to solve system of linear equations of a least squares-SVM on a NISQ device. The implementation of our approach is evaluated by an extensive series of numerical experiments with the Iris dataset, which consists of three distinct iris plant species. Based on this, we explore the practicality and effectiveness of our algorithm by constructing a classifier capable of classification in a feature space ranging from one to seven dimensions. Furthermore, by strategically exploiting both classical and quantum computing for various subroutines of our algorithm, we effectively mitigate practical challenges associated with the implementation. These include significant improvement in the trainability of the variational ansatz and notable reductions in run-time for cost calculations. Based on the numerical experiments, our approach exhibits the capability of identifying a separating hyperplane in an 8-dimensional feature space. Moreover, it consistently demonstrated strong performance across various instances with the same dataset.  ( 2 min )
    When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making. (arXiv:2308.11721v2 [cs.LG] UPDATED)
    Historically, much of machine learning research has focused on the performance of the algorithm alone, but recently more attention has been focused on optimizing joint human-algorithm performance. Here, we analyze a specific type of human-algorithm collaboration where the algorithm has access to a set of $n$ items, and presents a subset of size $k$ to the human, who selects a final item from among those $k$. This scenario could model content recommendation, route planning, or any type of labeling task. Because both the human and algorithm have imperfect, noisy information about the true ordering of items, the key question is: which value of $k$ maximizes the probability that the best item will be ultimately selected? For $k=1$, performance is optimized by the algorithm acting alone, and for $k=n$ it is optimized by the human acting alone. Surprisingly, we show that for multiple of noise models, it is optimal to set $k \in [2, n-1]$ - that is, there are strict benefits to collaborating, even when the human and algorithm have equal accuracy separately. We demonstrate this theoretically for the Mallows model and experimentally for the Random Utilities models of noisy permutations. However, we show this pattern is reversed when the human is anchored on the algorithm's presented ordering - the joint system always has strictly worse performance. We extend these results to the case where the human and algorithm differ in their accuracy levels, showing that there always exist regimes where a more accurate agent would strictly benefit from collaborating with a less accurate one, but these regimes are asymmetric between the human and the algorithm's accuracy.  ( 3 min )
    Predicting Survival Time of Ball Bearings in the Presence of Censoring. (arXiv:2309.07188v1 [eess.SP])
    Ball bearings find widespread use in various manufacturing and mechanical domains, and methods based on machine learning have been widely adopted in the field to monitor wear and spot defects before they lead to failures. Few studies, however, have addressed the problem of censored data, in which failure is not observed. In this paper, we propose a novel approach to predict the time to failure in ball bearings using survival analysis. First, we analyze bearing data in the frequency domain and annotate when a bearing fails by comparing the Kullback-Leibler divergence and the standard deviation between its break-in frequency bins and its break-out frequency bins. Second, we train several survival models to estimate the time to failure based on the annotated data and covariates extracted from the time domain, such as skewness, kurtosis and entropy. The models give a probabilistic prediction of risk over time and allow us to compare the survival function between groups of bearings. We demonstrate our approach on the XJTU and PRONOSTIA datasets. On XJTU, the best result is a 0.70 concordance-index and 0.21 integrated Brier score. On PRONOSTIA, the best is a 0.76 concordance-index and 0.19 integrated Brier score. Our work motivates further work on incorporating censored data in models for predictive maintenance.  ( 2 min )
    On a continuous time model of gradient descent dynamics and instability in deep learning. (arXiv:2302.01952v3 [stat.ML] UPDATED)
    The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.  ( 2 min )
    Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck. (arXiv:2309.07200v1 [cs.LG])
    Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large jumps in time. To achieve this, we propose Time-lagged Information Bottleneck (T-IB), a principled objective rooted in information theory, which aims to capture relevant temporal features while discarding high-frequency information to simplify the simulation task and minimize the inference error. Our experiments demonstrate that T-IB learns information-optimal representations for accurately modeling the statistical properties and dynamics of the original process at a selected time lag, outperforming existing time-lagged dimensionality reduction methods.  ( 2 min )
    Physically Plausible Full-Body Hand-Object Interaction Synthesis. (arXiv:2309.07907v1 [cs.RO])
    We propose a physics-based method for synthesizing dexterous hand-object interactions in a full-body setting. While recent advancements have addressed specific facets of human-object interactions, a comprehensive physics-based approach remains a challenge. Existing methods often focus on isolated segments of the interaction process and rely on data-driven techniques that may result in artifacts. In contrast, our proposed method embraces reinforcement learning (RL) and physics simulation to mitigate the limitations of data-driven approaches. Through a hierarchical framework, we first learn skill priors for both body and hand movements in a decoupled setting. The generic skill priors learn to decode a latent skill embedding into the motion of the underlying part. A high-level policy then controls hand-object interactions in these pretrained latent spaces, guided by task objectives of grasping and 3D target trajectory following. It is trained using a novel reward function that combines an adversarial style term with a task reward, encouraging natural motions while fulfilling the task incentives. Our method successfully accomplishes the complete interaction task, from approaching an object to grasping and subsequent manipulation. We compare our approach against kinematics-based baselines and show that it leads to more physically plausible motions.  ( 2 min )
    VerilogEval: Evaluating Large Language Models for Verilog Code Generation. (arXiv:2309.07544v1 [cs.LG])
    The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of Verilog code generation for hardware design and verification. We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits. The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines. The Verilog code completions can be automatically tested for functional correctness by comparing the transient simulation outputs of the generated design with a golden solution. We also demonstrate that the Verilog code generation capability of pretrained language models could be improved with supervised fine-tuning by bootstrapping with LLM generated synthetic problem-code pairs.  ( 2 min )
    Machine Learning and Computer Vision Techniques in Continuous Beehive Monitoring Applications: A survey. (arXiv:2208.00085v3 [cs.CV] UPDATED)
    Wide use and availability of the machine learning and computer vision techniques allows development of relatively complex monitoring systems in many domains. Besides the traditional industrial domain, new application appears also in biology and agriculture, where we could speak about the detection of infections, parasites and weeds, but also about automated monitoring and early warning systems. This is also connected with the introduction of the easily accessible hardware and development kits such as Arduino, or RaspberryPi family. In this paper, we survey 50 existing papers focusing on the methods of automated beehive monitoring methods using the computer vision techniques, particularly on the pollen and Varroa mite detection together with the bee traffic monitoring. Such systems could also be used for the monitoring of the honeybee colonies and for the inspection of their health state, which could identify potentially dangerous states before the situation is critical, or to better plan periodic bee colony inspections and therefore save significant costs. Later, we also include analysis of the research trends in this application field and we outline the possible direction of the new explorations. Our paper is aimed also at veterinary and apidology professionals and experts, who might not be familiar with machine learning to introduce them to its possibilities, therefore each family of applications is opened by a brief theoretical introduction and motivation related to its base method. We hope that this paper will inspire other scientists to use machine learning techniques for other applications in beehive monitoring.  ( 3 min )
    Kernel Conditional Moment Constraints for Confounding Robust Inference. (arXiv:2302.13348v2 [stat.ML] UPDATED)
    We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value. It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to the conditional moment constraints, which traditional non-sharp estimators failed to take into account. In the theoretical analysis, we provide a condition for the choice of the kernel which guarantees no specification error that biases the lower bound estimation. Furthermore, we provide consistency guarantees of policy evaluation and learning. In the experiments with synthetic and real-world data, we demonstrate the effectiveness of the proposed method.  ( 2 min )
    EarthPT: a foundation model for Earth Observation. (arXiv:2309.07207v1 [cs.LG])
    We introduce EarthPT -- an Earth Observation (EO) pretrained transformer. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. We demonstrate that EarthPT is an effective forecaster that can accurately predict future pixel-level surface reflectances across the 400-2300 nm range well into the future. For example, forecasts of the evolution of the Normalised Difference Vegetation Index (NDVI) have a typical error of approximately 0.05 (over a natural range of -1 -> 1) at the pixel level over a five month test set horizon, out-performing simple phase-folded models based on historical averaging. We also demonstrate that embeddings learnt by EarthPT hold semantically meaningful information and could be exploited for downstream tasks such as highly granular, dynamic land use classification. Excitingly, we note that the abundance of EO data provides us with -- in theory -- quadrillions of training tokens. Therefore, if we assume that EarthPT follows neural scaling laws akin to those derived for Large Language Models (LLMs), there is currently no data-imposed limit to scaling EarthPT and other similar `Large Observation Models.'  ( 2 min )
    Learning Quasi-Static 3D Models of Markerless Deformable Linear Objects for Bimanual Robotic Manipulation. (arXiv:2309.07609v1 [cs.RO])
    The robotic manipulation of Deformable Linear Objects (DLOs) is a vital and challenging task that is important in many practical applications. Classical model-based approaches to this problem require an accurate model to capture how robot motions affect the deformation of the DLO. Nowadays, data-driven models offer the best tradeoff between quality and computation time. This paper analyzes several learning-based 3D models of the DLO and proposes a new one based on the Transformer architecture that achieves superior accuracy, even on the DLOs of different lengths, thanks to the proposed scaling method. Moreover, we introduce a data augmentation technique, which improves the prediction performance of almost all considered DLO data-driven models. Thanks to this technique, even a simple Multilayer Perceptron (MLP) achieves close to state-of-the-art performance while being significantly faster to evaluate. In the experiments, we compare the performance of the learning-based 3D models of the DLO on several challenging datasets quantitatively and demonstrate their applicability in the task of shaping a DLO.  ( 2 min )
    PolicyCleanse: Backdoor Detection and Mitigation in Reinforcement Learning. (arXiv:2202.03609v5 [cs.LG] UPDATED)
    While real-world applications of reinforcement learning are becoming popular, the security and robustness of RL systems are worthy of more attention and exploration. In particular, recent works have revealed that, in a multi-agent RL environment, backdoor trigger actions can be injected into a victim agent (a.k.a. Trojan agent), which can result in a catastrophic failure as soon as it sees the backdoor trigger action. To ensure the security of RL agents against malicious backdoors, in this work, we propose the problem of Backdoor Detection in a multi-agent competitive reinforcement learning system, with the objective of detecting Trojan agents as well as the corresponding potential trigger actions, and further trying to mitigate their Trojan behavior. In order to solve this problem, we propose PolicyCleanse that is based on the property that the activated Trojan agents accumulated rewards degrade noticeably after several timesteps. Along with PolicyCleanse, we also design a machine unlearning-based approach that can effectively mitigate the detected backdoor. Extensive experiments demonstrate that the proposed methods can accurately detect Trojan agents, and outperform existing backdoor mitigation baseline approaches by at least 3% in winning rate across various types of agents and environments.  ( 3 min )
    Learning From Drift: Federated Learning on Non-IID Data via Drift Regularization. (arXiv:2309.07189v1 [cs.LG])
    Federated learning algorithms perform reasonably well on independent and identically distributed (IID) data. They, on the other hand, suffer greatly from heterogeneous environments, i.e., Non-IID data. Despite the fact that many research projects have been done to address this issue, recent findings indicate that they are still sub-optimal when compared to training on IID data. In this work, we carefully analyze the existing methods in heterogeneous environments. Interestingly, we find that regularizing the classifier's outputs is quite effective in preventing performance degradation on Non-IID data. Motivated by this, we propose Learning from Drift (LfD), a novel method for effectively training the model in heterogeneous settings. Our scheme encapsulates two key components: drift estimation and drift regularization. Specifically, LfD first estimates how different the local model is from the global model (i.e., drift). The local model is then regularized such that it does not fall in the direction of the estimated drift. In the experiment, we evaluate each method through the lens of the five aspects of federated learning, i.e., Generalization, Heterogeneity, Scalability, Forgetting, and Efficiency. Comprehensive evaluation results clearly support the superiority of LfD in federated learning with Non-IID data.  ( 2 min )
    Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech. (arXiv:2110.12539v3 [cs.SD] UPDATED)
    We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vector quantizer for NTTS, as an enhancement to the well-known Variational Autoencoder (VAE) and Vector Quantized Variational Autoencoder (VQ-VAE) architectures. Compared to these previous architectures, our proposed model retains the benefits of using an utterance-level bottleneck, while keeping significant representation power and a discretized latent space small enough for efficient prediction from text. We train the model on recordings in the expressive task-oriented dialogues domain and show that SVQ-VAE achieves a statistically significant improvement in naturalness over the VAE and VQ-VAE models. Furthermore, we demonstrate that the SVQ-VAE latent acoustic space is predictable from text, reducing the gap between the standard constant vector synthesis and vocoded recordings by 32%.  ( 2 min )
    Using wearable device-based machine learning models to autonomously identify older adults with poor cognition. (arXiv:2309.07133v1 [eess.SP])
    Conducting cognitive tests is time-consuming for patients and clinicians. Wearable device-based prediction models allow for continuous health monitoring under normal living conditions and could offer an alternative to identifying older adults with cognitive impairments for early interventions. In this study, we first derived novel wearable-based features related to circadian rhythms, ambient light exposure, physical activity levels, sleep, and signal processing. Then, we quantified the ability of wearable-based machine-learning models to predict poor cognition based on outcomes from the Digit Symbol Substitution Test (DSST), the Consortium to Establish a Registry for Alzheimers Disease Word-Learning subtest (CERAD-WL), and the Animal Fluency Test (AFT). We found that the wearable-based models had significantly higher AUCs when predicting all three cognitive outcomes compared to benchmark models containing age, sex, education, marital status, household income, diabetic status, depression symptoms, and functional independence scores. In addition to uncovering previously unidentified wearable-based features that are predictive of poor cognition such as the standard deviation of the midpoints of each persons most active 10-hour periods and least active 5-hour periods, our paper provides proof-of-concept that wearable-based machine learning models can be used to autonomously screen older adults for possible cognitive impairments. Such models offer cost-effective alternatives to conducting initial screenings manually in clinical settings.  ( 2 min )
    MAHTM: A Multi-Agent Framework for Hierarchical Transactive Microgrids. (arXiv:2303.08447v2 [cs.LG] UPDATED)
    Integrating variable renewable energy into the grid has posed challenges to system operators in achieving optimal trade-offs among energy availability, cost affordability, and pollution controllability. This paper proposes a multi-agent reinforcement learning framework for managing energy transactions in microgrids. The framework addresses the challenges above: it seeks to optimize the usage of available resources by minimizing the carbon footprint while benefiting all stakeholders. The proposed architecture consists of three layers of agents, each pursuing different objectives. The first layer, comprised of prosumers and consumers, minimizes the total energy cost. The other two layers control the energy price to decrease the carbon impact while balancing the consumption and production of both renewable and conventional energy. This framework also takes into account fluctuations in energy demand and supply.  ( 2 min )
    BAFFLE: Backdoor Attack in Offline Reinforcement Learning. (arXiv:2210.04688v2 [cs.LG] UPDATED)
    A growing body of research has focused on the Reinforcement Learning (RL) methods which allow the agent to learn from trial-and-error experiences gathered during the interaction with the environment. Recently, offline RL becomes a popular RL paradigm because it saves the interactions with environments. In offline RL, data providers share large pre-collected datasets, and others can train high-quality agents without interacting with the environments. This paradigm has demonstrated effectiveness in critical tasks like robot control, autonomous driving, etc. However, less attention is paid to investigating the security threats to the offline RL system. This paper focuses on backdoor attacks, where some perturbations are added to the data (observations) such that given normal observations, the agent takes high-rewards actions, and low-reward actions on observations injected with triggers. In this paper, we propose Baffle (Backdoor Attack for Offline Reinforcement Learning), an approach that automatically implants backdoors to RL agents by poisoning the offline RL dataset, and evaluate how different offline RL algorithms react to this attack. Our experiments conducted on four tasks and four offline RL algorithms expose a disquieting fact: none of the existing offline RL algorithms is immune to such a backdoor attack. Baffle modifies $10\%$ of the datasets for four tasks. Agents trained on the poisoned datasets perform well in normal settings. However, when triggers are presented, the agents' performance decreases drastically by $63.2\%$, $53.9\%$, $64.7\%$, and $47.4\%$ in the four tasks on average. The backdoor still persists after fine-tuning poisoned agents on clean datasets. We further show that the inserted backdoor is also hard to be detected by a popular defensive method. This paper calls attention to developing more effective protection for the open-source offline RL dataset.  ( 3 min )
    Gaussian Process Surrogate Models for Neural Networks. (arXiv:2208.06028v2 [cs.LG] UPDATED)
    Not being able to understand and predict the behavior of deep learning systems makes it hard to decide what architecture and algorithm to use for a given problem. In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque. Modeling replaces a complex system with a simpler, more interpretable surrogate. Drawing inspiration from this, we construct a class of surrogate models for neural networks using Gaussian processes. Rather than deriving kernels for infinite neural networks, we learn kernels empirically from the naturalistic behavior of finite neural networks. We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems such as identifying which points most influence the behavior of specific neural networks and predicting which architectures and algorithms will generalize well for specific datasets.  ( 2 min )
    Some notes concerning a generalized KMM-type optimization method for density ratio estimation. (arXiv:2309.07887v1 [cs.LG])
    In the present paper we introduce new optimization algorithms for the task of density ratio estimation. More precisely, we consider extending the well-known KMM method using the construction of a suitable loss function, in order to encompass more general situations involving the estimation of density ratio with respect to subsets of the training data and test data, respectively. The associated codes can be found at https://github.com/CDAlecsa/Generalized-KMM.  ( 2 min )
    Correcting sampling biases via importance reweighting for spatial modeling. (arXiv:2309.04824v2 [cs.LG] UPDATED)
    In machine learning models, the estimation of errors is often complex due to distribution bias, particularly in spatial data such as those found in environmental studies. We introduce an approach based on the ideas of importance sampling to obtain an unbiased estimate of the target error. By taking into account difference between desirable error and available data, our method reweights errors at each sample point and neutralizes the shift. Importance sampling technique and kernel density estimation were used for reweighteing. We validate the effectiveness of our approach using artificial data that resemble real-world spatial datasets. Our findings demonstrate advantages of the proposed approach for the estimation of the target error, offering a solution to a distribution shift problem. Overall error of predictions dropped from 7% to just 2% and it gets smaller for larger samples.  ( 2 min )
    Domain Generalization for Crop Segmentation with Knowledge Distillation. (arXiv:2304.01029v2 [cs.CV] UPDATED)
    In recent years, precision agriculture has gradually oriented farming closer to automation processes to support all the activities related to field management. Service robotics plays a predominant role in this evolution by deploying autonomous agents that can navigate fields while performing tasks without human intervention, such as monitoring, spraying, and harvesting. To execute these precise actions, mobile robots need a real-time perception system that understands their surroundings and identifies their targets in the wild. Generalizing to new crops and environmental conditions is critical for practical applications, as labeled samples are rarely available. In this paper, we investigate the problem of crop segmentation and propose a novel approach to enhance domain generalization using knowledge distillation. In the proposed framework, we transfer knowledge from an ensemble of models individually trained on source domains to a student model that can adapt to unseen target domains. To evaluate the proposed method, we present a synthetic multi-domain dataset for crop segmentation containing plants of variegate shapes and covering different terrain styles, weather conditions, and light scenarios for more than 50,000 samples. We demonstrate significant improvements in performance over state-of-the-art methods and superior sim-to-real generalization. Our approach provides a promising solution for domain generalization in crop segmentation and has the potential to enhance a wide variety of precision agriculture applications.  ( 2 min )
    Adaptive approximation of monotone functions. (arXiv:2309.07530v1 [cs.LG])
    We study the classical problem of approximating a non-decreasing function $f: \mathcal{X} \to \mathcal{Y}$ in $L^p(\mu)$ norm by sequentially querying its values, for known compact real intervals $\mathcal{X}$, $\mathcal{Y}$ and a known probability measure $\mu$ on $\cX$. For any function~$f$ we characterize the minimum number of evaluations of $f$ that algorithms need to guarantee an approximation $\hat{f}$ with an $L^p(\mu)$ error below $\epsilon$ after stopping. Unlike worst-case results that hold uniformly over all $f$, our complexity measure is dependent on each specific function $f$. To address this problem, we introduce GreedyBox, a generalization of an algorithm originally proposed by Novak (1992) for numerical integration. We prove that GreedyBox achieves an optimal sample complexity for any function $f$, up to logarithmic factors. Additionally, we uncover results regarding piecewise-smooth functions. Perhaps as expected, the $L^p(\mu)$ error of GreedyBox decreases much faster for piecewise-$C^2$ functions than predicted by the algorithm (without any knowledge on the smoothness of $f$). A simple modification even achieves optimal minimax approximation rates for such functions, which we compute explicitly. In particular, our findings highlight multiple performance gaps between adaptive and non-adaptive algorithms, smooth and piecewise-smooth functions, as well as monotone or non-monotone functions. Finally, we provide numerical experiments to support our theoretical results.  ( 2 min )
    ConSpec: honing in on critical steps for rapid learning and generalization in RL. (arXiv:2210.05845v5 [cs.LG] UPDATED)
    In real life, success is often contingent upon multiple critical steps that are distant in time from each other and from the final reward. These critical steps are challenging to identify with traditional reinforcement learning (RL) methods that rely on the Bellman equation for credit assignment. Here, we present a new RL algorithm that uses offline contrastive learning to hone in on critical steps. This algorithm, which we call contrastive introspection (ConSpec), can be added to any existing RL algorithm. ConSpec learns a set of prototypes for the critical steps in a task by a novel contrastive loss and delivers an intrinsic reward when the current state matches one of these prototypes. The prototypes in ConSpec provide two key benefits for credit assignment: (1) They enable rapid identification of all the critical steps. (2) They do so in a readily interpretable manner, enabling out-of-distribution generalization when sensory features are altered. Distinct from other contemporary RL approaches to credit assignment, ConSpec takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon than it is to prospectively predict reward at every step taken in the environment. Altogether, ConSpec improves learning in a diverse set of RL tasks, including both those with explicit, discrete critical steps and those with complex, continuous critical steps.  ( 3 min )
    A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time. (arXiv:2309.07418v1 [cs.DS])
    Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$ is Kronecker product between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. The $X, Y \in \mathbb{R}^{d \times d}$ are variables we want to learn. $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. In a multi-layer LLM network, the matrix $B \in \mathbb{R}^{n \times d}$ can be viewed as the output of a layer, and $A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$ can be viewed as the input of a layer. The matrix version of $x$ can be viewed as $QK^\top$ and $Y$ can be viewed as $V$. We provide an iterative greedy algorithm to train loss function $L(X,Y)$ up $\epsilon$ that runs in $\widetilde{O}( ({\cal T}_{\mathrm{mat}}(n,n,d) + {\cal T}_{\mathrm{mat}}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$ time. Here ${\cal T}_{\mathrm{mat}}(a,b,c)$ denotes the time of multiplying $a \times b$ matrix another $b \times c$ matrix, and $\omega\approx 2.37$ denotes the exponent of matrix multiplication.  ( 3 min )
    HurriCast: An Automatic Framework Using Machine Learning and Statistical Modeling for Hurricane Forecasting. (arXiv:2309.07174v1 [cs.LG])
    Hurricanes present major challenges in the U.S. due to their devastating impacts. Mitigating these risks is important, and the insurance industry is central in this effort, using intricate statistical models for risk assessment. However, these models often neglect key temporal and spatial hurricane patterns and are limited by data scarcity. This study introduces a refined approach combining the ARIMA model and K-MEANS to better capture hurricane trends, and an Autoencoder for enhanced hurricane simulations. Our experiments show that this hybrid methodology effectively simulate historical hurricane behaviors while providing detailed projections of potential future trajectories and intensities. Moreover, by leveraging a comprehensive yet selective dataset, our simulations enrich the current understanding of hurricane patterns and offer actionable insights for risk management strategies.  ( 2 min )
    The kernel-balanced equation for deep neural networks. (arXiv:2309.07367v1 [cond-mat.dis-nn])
    Deep neural networks have shown many fruitful applications in this decade. A network can get the generalized function through training with a finite dataset. The degree of generalization is a realization of the proximity scale in the data space. Specifically, the scale is not clear if the dataset is complicated. Here we consider a network for the distribution estimation of the dataset. We show the estimation is unstable and the instability depends on the data density and training duration. We derive the kernel-balanced equation, which gives a short phenomenological description of the solution. The equation tells us the reason for the instability and the mechanism of the scale. The network outputs a local average of the dataset as a prediction and the scale of averaging is determined along the equation. The scale gradually decreases along training and finally results in instability in our case.  ( 2 min )
    LambdaKG: A Library for Pre-trained Language Model-Based Knowledge Graph Embeddings. (arXiv:2210.00305v3 [cs.CL] UPDATED)
    Knowledge Graphs (KGs) often have two characteristics: heterogeneous graph structure and text-rich entity/relation information. Text-based KG embeddings can represent entities by encoding descriptions with pre-trained language models, but no open-sourced library is specifically designed for KGs with PLMs at present. In this paper, we present LambdaKG, a library for KGE that equips with many pre-trained language models (e.g., BERT, BART, T5, GPT-3), and supports various tasks (e.g., knowledge graph completion, question answering, recommendation, and knowledge probing). LambdaKG is publicly open-sourced at https://github.com/zjunlp/PromptKG/tree/main/lambdaKG, with a demo video at this http URL and long-term maintenance.  ( 2 min )
    Model-free Learning of Regions of Attraction via Recurrent Sets. (arXiv:2204.10372v2 [cs.LG] UPDATED)
    We consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point without an explicit model of the dynamics. Rather than leveraging approximate models with bounded uncertainty to find a (robust) invariant set contained in the ROA, we propose to learn sets that satisfy a more relaxed notion of containment known as recurrence. We define a set to be $\tau$-recurrent (resp. $k$-recurrent) if every trajectory that starts within the set, returns to it after at most $\tau$ seconds (resp. $k$ steps). We show that under mild assumptions a $\tau$-recurrent set containing a stable equilibrium must be a subset of its ROA. We then leverage this property to develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Our algorithms process samples sequentially, which allow them to continue being executed even after an initial offline training stage. We further provide an upper bound on the number of counter-examples used by the algorithm, and almost sure convergence guarantees.  ( 2 min )
    Neural Categorical Priors for Physics-Based Character Control. (arXiv:2308.07200v2 [cs.GR] UPDATED)
    Recent advances in learning reusable motion priors have demonstrated their effectiveness in generating naturalistic behaviors. In this paper, we propose a new learning framework in this paradigm for controlling physics-based characters with significantly improved motion quality and diversity over existing state-of-the-art methods. The proposed method uses reinforcement learning (RL) to initially track and imitate life-like movements from unstructured motion clips using the discrete information bottleneck, as adopted in the Vector Quantized Variational AutoEncoder (VQ-VAE). This structure compresses the most relevant information from the motion clips into a compact yet informative latent space, i.e., a discrete space over vector quantized codes. By sampling codes in the space from a trained categorical prior distribution, high-quality life-like behaviors can be generated, similar to the usage of VQ-VAE in computer vision. Although this prior distribution can be trained with the supervision of the encoder's output, it follows the original motion clip distribution in the dataset and could lead to imbalanced behaviors in our setting. To address the issue, we further propose a technique named prior shifting to adjust the prior distribution using curiosity-driven RL. The outcome distribution is demonstrated to offer sufficient behavioral diversity and significantly facilitates upper-level policy learning for downstream tasks. We conduct comprehensive experiments using humanoid characters on two challenging downstream tasks, sword-shield striking and two-player boxing game. Our results demonstrate that the proposed framework is capable of controlling the character to perform considerably high-quality movements in terms of behavioral strategies, diversity, and realism. Videos, codes, and data are available at https://tencent-roboticsx.github.io/NCP/.  ( 3 min )
    Feature Engineering in Learning-to-Rank for Community Question Answering Task. (arXiv:2309.07610v1 [cs.LG])
    Community question answering (CQA) forums are Internet-based platforms where users ask questions about a topic and other expert users try to provide solutions. Many CQA forums such as Quora, Stackoverflow, Yahoo!Answer, StackExchange exist with a lot of user-generated data. These data are leveraged in automated CQA ranking systems where similar questions (and answers) are presented in response to the query of the user. In this work, we empirically investigate a few aspects of this domain. Firstly, in addition to traditional features like TF-IDF, BM25 etc., we introduce a BERT-based feature that captures the semantic similarity between the question and answer. Secondly, most of the existing research works have focused on features extracted only from the question part; features extracted from answers have not been explored extensively. We combine both types of features in a linear fashion. Thirdly, using our proposed concepts, we conduct an empirical investigation with different rank-learning algorithms, some of which have not been used so far in CQA domain. On three standard CQA datasets, our proposed framework achieves state-of-the-art performance. We also analyze importance of the features we use in our investigation. This work is expected to guide the practitioners to select a better set of features for the CQA retrieval task.  ( 2 min )
    Using Unsupervised and Supervised Learning and Digital Twin for Deep Convective Ice Storm Classification. (arXiv:2309.07173v1 [cs.LG])
    Smart Ice Cloud Sensing (SMICES) is a small-sat concept in which a primary radar intelligently targets ice storms based on information collected by a lookahead radiometer. Critical to the intelligent targeting is accurate identification of storm/cloud types from eight bands of radiance collected by the radiometer. The cloud types of interest are: clear sky, thin cirrus, cirrus, rainy anvil, and convection core. We describe multi-step use of Machine Learning and Digital Twin of the Earth's atmosphere to derive such a classifier. First, a digital twin of Earth's atmosphere called a Weather Research Forecast (WRF) is used generate simulated lookahead radiometer data as well as deeper "science" hidden variables. The datasets simulate a tropical region over the Caribbean and a non-tropical region over the Atlantic coast of the United States. A K-means clustering over the scientific hidden variables was utilized by human experts to generate an automatic labelling of the data - mapping each physical data point to cloud types by scientists informed by mean/centroids of hidden variables of the clusters. Next, classifiers were trained with the inputs of the simulated radiometer data and its corresponding label. The classifiers of a random decision forest (RDF), support vector machine (SVM), Gaussian na\"ive bayes, feed forward artificial neural network (ANN), and a convolutional neural network (CNN) were trained. Over the tropical dataset, the best performing classifier was able to identify non-storm and storm clouds with over 80% accuracy in each class for a held-out test set. Over the non-tropical dataset, the best performing classifier was able to classify non-storm clouds with over 90% accuracy and storm clouds with over 40% accuracy. Additionally both sets of classifiers were shown to be resilient to instrument noise.  ( 3 min )
    Goal Space Abstraction in Hierarchical Reinforcement Learning via Reachability Analysis. (arXiv:2309.07168v1 [cs.LG])
    Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this work, we propose a developmental mechanism for subgoal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We create a HRL algorithm that gradually learns this representation along with the policies and evaluate it on navigation tasks to show the learned representation is interpretable and results in data efficiency.  ( 2 min )
    CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis. (arXiv:2309.07178v1 [q-bio.QM])
    Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep learning tools is hard to be widely used in NMR due to the sophisticated setup of computation. Thus, NMR processing is not an easy task for chemist and biologists. In this work, we present CloudBrain-NMR, an intelligent online cloud computing platform designed for NMR data reading, processing, reconstruction, and quantitative analysis. The platform is conveniently accessed through a web browser, eliminating the need for any program installation on the user side. CloudBrain-NMR uses parallel computing with graphics processing units and central processing units, resulting in significantly shortened computation time. Furthermore, it incorporates state-of-the-art deep learning-based algorithms offering comprehensive functionalities that allow users to complete the entire processing procedure without relying on additional software. This platform has empowered NMR applications with advanced artificial intelligence processing. CloudBrain-NMR is openly accessible for free usage at https://csrc.xmu.edu.cn/CloudBrain.html  ( 2 min )
    Market-GAN: Adding Control to Financial Market Data Generation with Semantic Context. (arXiv:2309.07708v1 [cs.LG])
    Financial simulators play an important role in enhancing forecasting accuracy, managing risks, and fostering strategic financial decision-making. Despite the development of financial market simulation methodologies, existing frameworks often struggle with adapting to specialized simulation context. We pinpoint the challenges as i) current financial datasets do not contain context labels; ii) current techniques are not designed to generate financial data with context as control, which demands greater precision compared to other modalities; iii) the inherent difficulties in generating context-aligned, high-fidelity data given the non-stationary, noisy nature of financial data. To address these challenges, our contributions are: i) we proposed the Contextual Market Dataset with market dynamics, stock ticker, and history state as context, leveraging a market dynamics modeling method that combines linear regression and Dynamic Time Warping clustering to extract market dynamics; ii) we present Market-GAN, a novel architecture incorporating a Generative Adversarial Networks (GAN) for the controllable generation with context, an autoencoder for learning low-dimension features, and supervisors for knowledge transfer; iii) we introduce a two-stage training scheme to ensure that Market-GAN captures the intrinsic market distribution with multiple objectives. In the pertaining stage, with the use of the autoencoder and supervisors, we prepare the generator with a better initialization for the adversarial training stage. We propose a set of holistic evaluation metrics that consider alignment, fidelity, data usability on downstream tasks, and market facts. We evaluate Market-GAN with the Dow Jones Industrial Average data from 2000 to 2023 and showcase superior performance in comparison to 4 state-of-the-art time-series generative models.  ( 3 min )
    Traveling Words: A Geometric Interpretation of Transformers. (arXiv:2309.07315v1 [cs.CL])
    Transformers have significantly advanced the field of natural language processing, but comprehending their internal mechanisms remains a challenge. In this paper, we introduce a novel geometric perspective that elucidates the inner mechanisms of transformer operations. Our primary contribution is illustrating how layer normalization confines the latent features to a hyper-sphere, subsequently enabling attention to mold the semantic representation of words on this surface. This geometric viewpoint seamlessly connects established properties such as iterative refinement and contextual embeddings. We validate our insights by probing a pre-trained 124M parameter GPT-2 model. Our findings reveal clear query-key attention patterns in early layers and build upon prior observations regarding the subject-specific nature of attention heads at deeper layers. Harnessing these geometric insights, we present an intuitive understanding of transformers, depicting them as processes that model the trajectory of word particles along the hyper-sphere.  ( 2 min )
  • Open

    Random Feature Amplification: Feature Learning and Generalization in Neural Networks. (arXiv:2202.07626v4 [cs.LG] UPDATED)
    In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.
    Learning nonparametric DAGs with incremental information via high-order HSIC. (arXiv:2308.05969v2 [cs.LG] UPDATED)
    Score-based methods for learning Bayesain networks(BN) aim to maximizing the global score functions. However, if local variables have direct and indirect dependence simultaneously, the global optimization on score functions misses edges between variables with indirect dependent relationship, of which scores are smaller than those with direct dependent relationship. In this paper, we present an identifiability condition based on a determined subset of parents to identify the underlying DAG. By the identifiability condition, we develop a two-phase algorithm namely optimal-tuning (OT) algorithm to locally amend the global optimization. In the optimal phase, an optimization problem based on first-order Hilbert-Schmidt independence criterion (HSIC) gives an estimated skeleton as the initial determined parents subset. In the tuning phase, the skeleton is locally tuned by deletion, addition and DAG-formalization strategies using the theoretically proved incremental properties of high-order HSIC. Numerical experiments for different synthetic datasets and real-world datasets show that the OT algorithm outperforms existing methods. Especially in Sigmoid Mix model with the size of the graph being ${\rm\bf d=40}$, the structure intervention distance (SID) of the OT algorithm is 329.7 smaller than the one obtained by CAM, which indicates that the graph estimated by the OT algorithm misses fewer edges compared with CAM.Source code of the OT algorithm is available at https://github.com/YafeiannWang/optimal-tune-algorithm.
    SC-MAD: Mixtures of Higher-order Networks for Data Augmentation. (arXiv:2309.07453v1 [stat.ML])
    The myriad complex systems with multiway interactions motivate the extension of graph-based pairwise connections to higher-order relations. In particular, the simplicial complex has inspired generalizations of graph neural networks (GNNs) to simplicial complex-based models. Learning on such systems requires large amounts of data, which can be expensive or impossible to obtain. We propose data augmentation of simplicial complexes through both linear and nonlinear mixup mechanisms that return mixtures of existing labeled samples. In addition to traditional pairwise mixup, we present a convex clustering mixup approach for a data-driven relationship among several simplicial complexes. We theoretically demonstrate that the resultant synthetic simplicial complexes interpolate among existing data with respect to homomorphism densities. Our method is demonstrated on both synthetic and real-world datasets for simplicial complex classification.
    Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning. (arXiv:2309.07666v1 [cs.LG])
    In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD. To solve it, we adapt previous works in the MSDA literature, such as Wasserstein Barycenter Transport and Dataset Dictionary Learning, as well as DD method Distribution Matching. We thoroughly experiment with this novel problem on four benchmarks (Caltech-Office 10, Tennessee-Eastman Process, Continuous Stirred Tank Reactor, and Case Western Reserve University), where we show that, even with as little as 1 sample per class, one achieves state-of-the-art adaptation performance.
    Gaussian Process Surrogate Models for Neural Networks. (arXiv:2208.06028v2 [cs.LG] UPDATED)
    Not being able to understand and predict the behavior of deep learning systems makes it hard to decide what architecture and algorithm to use for a given problem. In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque. Modeling replaces a complex system with a simpler, more interpretable surrogate. Drawing inspiration from this, we construct a class of surrogate models for neural networks using Gaussian processes. Rather than deriving kernels for infinite neural networks, we learn kernels empirically from the naturalistic behavior of finite neural networks. We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems such as identifying which points most influence the behavior of specific neural networks and predicting which architectures and algorithms will generalize well for specific datasets.
    Dataset Size Dependence of Rate-Distortion Curve and Threshold of Posterior Collapse in Linear VAE. (arXiv:2309.07663v1 [stat.ML])
    In the Variational Autoencoder (VAE), the variational posterior often aligns closely with the prior, which is known as posterior collapse and hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter beta has been introduced in the VAE. This paper presents a closed-form expression to assess the relationship between the beta in VAE, the dataset size, the posterior collapse, and the rate-distortion curve by analyzing a minimal VAE in a high-dimensional limit. These results clarify that a long plateau in the generalization error emerges with a relatively larger beta. As the beta increases, the length of the plateau extends and then becomes infinite beyond a certain beta threshold. This implies that the choice of beta, unlike the usual regularization parameters, can induce posterior collapse regardless of the dataset size. Thus, beta is a risky parameter that requires careful tuning. Furthermore, considering the dataset-size dependence on the rate-distortion curve, a relatively large dataset is required to obtain a rate-distortion curve with high rates. Extensive numerical experiments support our analysis.
    Trajectory-oriented optimization of stochastic epidemiological models. (arXiv:2305.03926v3 [stat.AP] UPDATED)
    Epidemiological models must be calibrated to ground truth for downstream tasks such as producing forward projections or running what-if scenarios. The meaning of calibration changes in case of a stochastic model since output from such a model is generally described via an ensemble or a distribution. Each member of the ensemble is usually mapped to a random number seed (explicitly or implicitly). With the goal of finding not only the input parameter settings but also the random seeds that are consistent with the ground truth, we propose a class of Gaussian process (GP) surrogates along with an optimization strategy based on Thompson sampling. This Trajectory Oriented Optimization (TOO) approach produces actual trajectories close to the empirical observations instead of a set of parameter settings where only the mean simulation behavior matches with the ground truth.
    Nowhere coexpanding functions. (arXiv:2303.12814v3 [stat.ML] UPDATED)
    We define a family of $C^1$ functions which we call "nowhere coexpanding functions" that is closed under composition and includes all $C^3$ functions with non-positive Schwarzian derivative. We establish results on the number and nature of the fixed points of these functions, including a generalisation of a classic result of Singer.
    Simultaneous inference for generalized linear models with unmeasured confounders. (arXiv:2309.07261v1 [stat.ME])
    Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.
    Speeding up Learning Quantum States through Group Equivariant Convolutional Quantum Ans\"atze. (arXiv:2112.07611v3 [quant-ph] UPDATED)
    We develop a theoretical framework for $S_n$-equivariant convolutional quantum circuits with SU$(d)$-symmetry, building on and significantly generalizing Jordan's Permutational Quantum Computing (PQC) formalism based on Schur-Weyl duality connecting both SU$(d)$ and $S_n$ actions on qudits. In particular, we utilize the Okounkov-Vershik approach to prove Harrow's statement (Ph.D. Thesis 2005 p.160) on the equivalence between $\operatorname{SU}(d)$ and $S_n$ irrep bases and to establish the $S_n$-equivariant Convolutional Quantum Alternating Ans\"atze ($S_n$-CQA) using Young-Jucys-Murphy (YJM) elements. We prove that $S_n$-CQA is able to generate any unitary in any given $S_n$ irrep sector, which may serve as a universal model for a wide array of quantum machine learning problems with the presence of SU($d$) symmetry. Our method provides another way to prove the universality of Quantum Approximate Optimization Algorithm (QAOA) and verifies that 4-local SU($d$) symmetric unitaries are sufficient to build generic SU($d$) symmetric quantum circuits up to relative phase factors. We present numerical simulations to showcase the effectiveness of the ans\"atze to find the ground state energy of the $J_1$--$J_2$ antiferromagnetic Heisenberg model on the rectangular and Kagome lattices. Our work provides the first application of the celebrated Okounkov-Vershik's $S_n$ representation theory to quantum physics and machine learning, from which to propose quantum variational ans\"atze that strongly suggests to be classically intractable tailored towards a specific optimization problem.
    Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining. (arXiv:2309.07332v1 [cs.LG])
    Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subtyping breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6% and 89.0%). Our method offers the potential to substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data.  ( 3 min )
    Choosing a Proxy Metric from Past Experiments. (arXiv:2309.07893v1 [stat.ME])
    In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric -- so they can be used to effectively guide decision-making in the near-term. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments. Our procedure first reduces the construction of an optimal proxy metric in a given experiment to a portfolio optimization problem which depends on the true latent treatment effects and noise level of experiment under consideration. We then denoise the observed treatment effects of the long-term metric and a set of proxies in a historical corpus of randomized experiments to extract estimates of the latent treatment effects for use in the optimization problem. One key insight derived from our approach is that the optimal proxy metric for a given experiment is not apriori fixed; rather it should depend on the sample size (or effective noise level) of the randomized experiment for which it is deployed. To instantiate and evaluate our framework, we employ our methodology in a large corpus of randomized experiments from an industrial recommendation system and construct proxy metrics that perform favorably relative to several baselines.  ( 2 min )
    Joint Community Detection and Rotational Synchronization via Semidefinite Programming. (arXiv:2105.06031v2 [stat.ML] UPDATED)
    In the presence of heterogeneous data, where randomly rotated objects fall into multiple underlying categories, it is challenging to simultaneously classify them into clusters and synchronize them based on pairwise relations. This gives rise to the joint problem of community detection and synchronization. We propose a series of semidefinite relaxations, and prove their exact recovery when extending the celebrated stochastic block model to this new setting where both rotations and cluster identities are to be determined. Numerical experiments demonstrate the efficacy of our proposed algorithms and confirm our theoretical result which indicates a sharp phase transition for exact recovery.  ( 2 min )
    Beta Diffusion. (arXiv:2309.07867v1 [cs.LG])
    We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges. Using scaled and shifted beta distributions, beta diffusion utilizes multiplicative transitions over time to create both forward and reverse diffusion processes, maintaining beta distributions in both the forward marginals and the reverse conditionals, given the data at any point in time. Unlike traditional diffusion-based generative models relying on additive Gaussian noise and reweighted evidence lower bounds (ELBOs), beta diffusion is multiplicative and optimized with KL-divergence upper bounds (KLUBs) derived from the convexity of the KL divergence. We demonstrate that the proposed KLUBs are more effective for optimizing beta diffusion compared to negative ELBOs, which can also be derived as the KLUBs of the same KL divergence with its two arguments swapped. The loss function of beta diffusion, expressed in terms of Bregman divergence, further supports the efficacy of KLUBs for optimization. Experimental results on both synthetic data and natural images demonstrate the unique capabilities of beta diffusion in generative modeling of range-bounded data and validate the effectiveness of KLUBs in optimizing diffusion models, thereby making them valuable additions to the family of diffusion-based generative models and the optimization techniques used to train them.  ( 2 min )
    A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time. (arXiv:2309.07418v1 [cs.DS])
    Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$ is Kronecker product between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. The $X, Y \in \mathbb{R}^{d \times d}$ are variables we want to learn. $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. In a multi-layer LLM network, the matrix $B \in \mathbb{R}^{n \times d}$ can be viewed as the output of a layer, and $A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$ can be viewed as the input of a layer. The matrix version of $x$ can be viewed as $QK^\top$ and $Y$ can be viewed as $V$. We provide an iterative greedy algorithm to train loss function $L(X,Y)$ up $\epsilon$ that runs in $\widetilde{O}( ({\cal T}_{\mathrm{mat}}(n,n,d) + {\cal T}_{\mathrm{mat}}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$ time. Here ${\cal T}_{\mathrm{mat}}(a,b,c)$ denotes the time of multiplying $a \times b$ matrix another $b \times c$ matrix, and $\omega\approx 2.37$ denotes the exponent of matrix multiplication.  ( 3 min )
    Kernel Conditional Moment Constraints for Confounding Robust Inference. (arXiv:2302.13348v2 [stat.ML] UPDATED)
    We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value. It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to the conditional moment constraints, which traditional non-sharp estimators failed to take into account. In the theoretical analysis, we provide a condition for the choice of the kernel which guarantees no specification error that biases the lower bound estimation. Furthermore, we provide consistency guarantees of policy evaluation and learning. In the experiments with synthetic and real-world data, we demonstrate the effectiveness of the proposed method.  ( 2 min )
    All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks. (arXiv:2309.07250v1 [quant-ph])
    Variational algorithms require architectures that naturally constrain the optimisation space to run efficiently. In geometric quantum machine learning, one achieves this by encoding group structure into parameterised quantum circuits to include the symmetries of a problem as an inductive bias. However, constructing such circuits is challenging as a concrete guiding principle has yet to emerge. In this paper, we propose the use of spin networks, a form of directed tensor network invariant under a group transformation, to devise SU(2) equivariant quantum circuit ans\"atze -- circuits possessing spin rotation symmetry. By changing to the basis that block diagonalises SU(2) group action, these networks provide a natural building block for constructing parameterised equivariant quantum circuits. We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and generalised permutations, but more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results highlight that our equivariant circuits boost the performance of quantum variational algorithms, indicating broader applicability to other real-world problems.  ( 2 min )
    Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense. (arXiv:2309.06724v2 [cs.CV] UPDATED)
    We aim to provide a general framework of for computational photography that recovers the real scene from imperfect images, via the Deep Nonparametric Convexified Filtering (DNCF). It is consists of a nonparametric deep network to resemble the physical equations behind the image formation, such as denoising, super-resolution, inpainting, and flash. DNCF has no parameterization dependent on training data, therefore has a strong generalization and robustness to adversarial image manipulation. During inference, we also encourage the network parameters to be nonnegative and create a bi-convex function on the input and parameters, and this adapts to second-order optimization algorithms with insufficient running time, having 10X acceleration over Deep Image Prior. With these tools, we empirically verify its capability to defend image classification deep networks against adversary attack algorithms in real-time.  ( 2 min )
    On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems. (arXiv:1906.00331v9 [cs.LG] UPDATED)
    We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set. One of the most popular algorithms for solving this problem is the celebrated gradient descent ascent (GDA) algorithm, which has been widely used in machine learning, control theory and economics. Despite the extensive convergence results for the convex-concave setting, GDA with equal stepsize can converge to limit cycles or even diverge in a general setting. In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. To the best our knowledge, this is the first nonasymptotic analysis for two-time-scale GDA in this setting, shedding light on its superior practical performance in training generative adversarial networks (GANs) and other real applications.  ( 3 min )
    Spectrum-Aware Adjustment: A New Debiasing Framework with Applications to Principal Components Regression. (arXiv:2309.07810v1 [math.ST])
    We introduce a new debiasing framework for high-dimensional linear regression that bypasses the restrictions on covariate distributions imposed by modern debiasing technology. We study the prevalent setting where the number of features and samples are both large and comparable. In this context, state-of-the-art debiasing technology uses a degrees-of-freedom correction to remove shrinkage bias of regularized estimators and conduct inference. However, this method requires that the observed samples are i.i.d., the covariates follow a mean zero Gaussian distribution, and reliable covariance matrix estimates for observed features are available. This approach struggles when (i) covariates are non-Gaussian with heavy tails or asymmetric distributions, (ii) rows of the design exhibit heterogeneity or dependencies, and (iii) reliable feature covariance estimates are lacking. To address these, we develop a new strategy where the debiasing correction is a rescaled gradient descent step (suitably initialized) with step size determined by the spectrum of the sample covariance matrix. Unlike prior work, we assume that eigenvectors of this matrix are uniform draws from the orthogonal group. We show this assumption remains valid in diverse situations where traditional debiasing fails, including designs with complex row-column dependencies, heavy tails, asymmetric properties, and latent low-rank structures. We establish asymptotic normality of our proposed estimator (centered and scaled) under various convergence notions. Moreover, we develop a consistent estimator for its asymptotic variance. Lastly, we introduce a debiased Principal Component Regression (PCR) technique using our Spectrum-Aware approach. In varied simulations and real data experiments, we observe that our method outperforms degrees-of-freedom debiasing by a margin.  ( 3 min )
    Survival Estimation for Missing not at Random Censoring Indicators based on Copula Models. (arXiv:2009.01726v2 [stat.ML] UPDATED)
    In the presence of right-censored data with covariates, the conditional Kaplan-Meier estimator (also known as the Beran estimator) consistently estimates the conditional survival function of the random follow-up for the event of interest. However, a necessary condition is the unambiguous knowledge of whether each individual is censored or not, which may be incomplete in practice. We therefore propose a study of the Beran estimator when the censoring indicators are generic random variables and discuss necessary conditions for the efficiency of the Beran estimator. From this, we provide a new estimator for the conditional survival function with missing not at random (MNAR) censoring indicators based on a conditional copula model for the missingness mechanism. In addition to the theoretical results, we illustrate how the estimators work for small samples through a simulation study and show their practical applicability by analyzing synthetic and real data.  ( 2 min )
    Convergence analysis of online algorithms for vector-valued kernel regression. (arXiv:2309.07779v1 [stat.ML])
    We consider the problem of approximating the regression function from noisy vector-valued data by an online learning algorithm using an appropriate reproducing kernel Hilbert space (RKHS) as prior. In an online algorithm, i.i.d. samples become available one by one by a random process and are successively processed to build approximations to the regression function. We are interested in the asymptotic performance of such online approximation algorithms and show that the expected squared error in the RKHS norm can be bounded by $C^2 (m+1)^{-s/(2+s)}$, where $m$ is the current number of processed data, the parameter $0<s\leq 1$ expresses an additional smoothness assumption on the regression function and the constant $C$ depends on the variance of the input noise, the smoothness of the regression function and further parameters of the algorithm.  ( 2 min )
    Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks. (arXiv:2302.07260v5 [cs.LG] UPDATED)
    Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained multi-fidelity optimization task involving shape optimization of rotor blades in turbo-machinery.  ( 3 min )
    Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data. (arXiv:2202.05928v4 [cs.LG] UPDATED)
    Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.  ( 2 min )
    Statistically Valid Variable Importance Assessment through Conditional Permutations. (arXiv:2309.07593v1 [cs.LG])
    Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An empirical benchmark on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods.  ( 2 min )
    Physics-informed Bayesian inference of external potentials in classical density-functional theory. (arXiv:2309.07065v2 [cond-mat.stat-mech] UPDATED)
    The swift progression of machine learning (ML) has not gone unnoticed in the realm of statistical mechanics. ML techniques have attracted attention by the classical density-functional theory (DFT) community, as they enable discovery of free-energy functionals to determine the equilibrium-density profile of a many-particle system. Within DFT, the external potential accounts for the interaction of the many-particle system with an external field, thus, affecting the density distribution. In this context, we introduce a statistical-learning framework to infer the external potential exerted on a many-particle system. We combine a Bayesian inference approach with the classical DFT apparatus to reconstruct the external potential, yielding a probabilistic description of the external potential functional form with inherent uncertainty quantification. Our framework is exemplified with a grand-canonical one-dimensional particle ensemble with excluded volume interactions in a confined geometry. The required training dataset is generated using a Monte Carlo (MC) simulation where the external potential is applied to the grand-canonical ensemble. The resulting particle coordinates from the MC simulation are fed into the learning framework to uncover the external potential. This eventually allows us to compute the equilibrium density profile of the system by using the tools of DFT. Our approach benchmarks the inferred density against the exact one calculated through the DFT formulation with the true external potential. The proposed Bayesian procedure accurately infers the external potential and the density profile. We also highlight the external-potential uncertainty quantification conditioned on the amount of available simulated data. The seemingly simple case study introduced in this work might serve as a prototype for studying a wide variety of applications, including adsorption and capillarity.  ( 3 min )
    Causal Entropy and Information Gain for Measuring Causal Control. (arXiv:2309.07703v1 [cs.LG])
    Artificial intelligence models and methods commonly lack causal interpretability. Despite the advancements in interpretable machine learning (IML) methods, they frequently assign importance to features which lack causal influence on the outcome variable. Selecting causally relevant features among those identified as relevant by these methods, or even before model training, would offer a solution. Feature selection methods utilizing information theoretical quantities have been successful in identifying statistically relevant features. However, the information theoretical quantities they are based on do not incorporate causality, rendering them unsuitable for such scenarios. To address this challenge, this article proposes information theoretical quantities that incorporate the causal structure of the system, which can be used to evaluate causal importance of features for some given outcome variable. Specifically, we introduce causal versions of entropy and mutual information, termed causal entropy and causal information gain, which are designed to assess how much control a feature provides over the outcome variable. These newly defined quantities capture changes in the entropy of a variable resulting from interventions on other variables. Fundamental results connecting these quantities to the existence of causal effects are derived. The use of causal information gain in feature selection is demonstrated, highlighting its superiority over standard mutual information in revealing which features provide control over a chosen outcome variable. Our investigation paves the way for the development of methods with improved interpretability in domains involving causation.  ( 3 min )
    Scalable Model-Based Gaussian Process Clustering. (arXiv:2309.07882v1 [stat.CO])
    Gaussian process is an indispensable tool in clustering functional data, owing to it's flexibility and inherent uncertainty quantification. However, when the functional data is observed over a large grid (say, of length $p$), Gaussian process clustering quickly renders itself infeasible, incurring $O(p^2)$ space complexity and $O(p^3)$ time complexity per iteration; and thus prohibiting it's natural adaptation to large environmental applications. To ensure scalability of Gaussian process clustering in such applications, we propose to embed the popular Vecchia approximation for Gaussian processes at the heart of the clustering task, provide crucial theoretical insights towards algorithmic design, and finally develop a computationally efficient expectation maximization (EM) algorithm. Empirical evidence of the utility of our proposal is provided via simulations and analysis of polar temperature anomaly (\href{https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series}{noaa.gov}) data-sets.  ( 2 min )
    Optimal and Fair Encouragement Policy Evaluation and Learning. (arXiv:2309.07176v1 [cs.LG])
    In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We consider fairness constraints such as demographic parity in treatment take-up, and other constraints, via constrained optimization. Our framework can be extended to handle algorithmic recommendations under an often-reasonable covariate-conditional exclusion restriction, using our robustness checks for lack of positivity in the recommendation. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. We illustrate the methods in two case studies based on data from randomized encouragement to enroll in insurance and from pretrial supervised release with electronic monitoring.  ( 2 min )
    On a continuous time model of gradient descent dynamics and instability in deep learning. (arXiv:2302.01952v3 [stat.ML] UPDATED)
    The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.  ( 2 min )

  • Open

    Google’s new DeepMind Maps algorithm improves route suggestions by 24%
    Google has developed an AI algorithm to refine route suggestions on Google Maps, personalizing it based on user data and behavior, allegedly improving the accuracy on an average by 16-24 percent. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/rp21ylzo7iob1.jpg?width=770&format=pjpg&auto=webp&s=737cf3fd278ade42900e8586d626a7c729522c1a Personalized Route Suggestions through AI The AI model comprises 360 million parameters, using real-time data from Maps users to influence factors including travel time, road conditions, tolls, and personal preferences to suggest routes. This technology is grounded on "inverse reinforcement learning" (IRL), specifically a new IRL algorithm - "Receding Horizon Inverse Planning (RHIP)". The Power of RHIP and AI in Maps Google and Deepmind jointly worked to develop RHIP, using complex stochastic models in immediate vicinity areas, but switching to simpler deterministic methods for distant areas for power conservation. The AI improves route suggestions for both driving and two-wheeled vehicles by learning from Maps users' movements and behaviors over time. Google states that this is the largest application of inverse reinforcement learning for route planning to date. Implementation and User Testing Google has applied the algorithm to Maps data globally, but extensive user testing is needed to confirm if the technique consistently produces better routes. Previous attempts at using AI systems for route planning on a large scale have often failed due to the complexity of road networks. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Bard on Republican 1/6 denial
    There are a number of reasons why some Republicans may be in denial about Trump's role in the January 6th Capitol riot. Trump has been very effective at cultivating a cult of personality among his supporters. These supporters are often fiercely loyal to him and willing to believe whatever he says, even if it contradicts reality. Many Republicans have been reluctant to criticize Trump for fear of alienating his supporters. This is especially true for Republican politicians who are up for re-election in 2022. Some Republicans may genuinely believe that Trump did not incite the riot. They may point to the fact that he did not explicitly call for violence, or that he told his supporters to "peacefully and patriotically make your voices heard." Other Republicans may be motivated by partis…  ( 10 min )
    Bard on factory farming and clean meat
    Factory farming is a cruel and unsustainable way to produce meat. Animals are crammed into small, filthy spaces, where they are often denied their basic needs. They are given antibiotics to prevent disease and growth hormones to make them grow faster. This leads to a variety of health problems for the animals, as well as for the people who eat their meat. Clean meat is a new technology that offers a more humane and sustainable way to produce meat. It is made from animal cells that are grown in a lab, without the need for animals to be raised or slaughtered. Clean meat is still in its early stages of development, but it has the potential to revolutionize the meat industry. Here are some of the reasons why factory farming must be replaced by clean meat: It is cruel to animals. Factory f…  ( 10 min )
    AI — weekly megathread!
    News provided by aibrews.com Stability AI launched Stable Audio, a generative AI tool for music & sound generation from text. The underlying latent diffusion model architecture uses audio conditioned on text metadata as well as audio file duration and start time [Details]. Coqui released XTTS - a new voice generation model that lets you clone voices in 13 different languages by using just a quick 3-second audio clip [Details]. Microsoft Research released and open-sourced Phi-1.5 - a 1.3 billion parameter transformer-based model with performance on natural language tasks comparable to models 5x larger [Paper ]. Project Gutenberg, Microsoft and MIT have worked together to use neural text-to-speech to create and release thousands of human-quality free and open audiobooks [Details]. Res…  ( 11 min )
    With things always moving so fast, I may have missed it. Anyone doing something like Run.AI in an opensource capacity?
    Has anyone dropped an opensource tool like run.ai, to leverage multiple gpus / distribute the workload a bit more efficiently? I'm loving some of the single gpu llm modifications that have been dropping recently (have a couple i've tested that ran well on 4090 and 3090ti in the lab), but i've got a plethora of 8 & 12 gig 3xxx series cards i'd love to take advantage of beyond passthroughs to individual vms. Looking for any solutions. Speed isn't as important as the ability to distributively run larger models. submitted by /u/SwallowedBuckyBalls [link] [comments]  ( 9 min )
    AI Death Metal band 'Anarchy's Echo' - Debut single 'Soul Shredder' [visuals, music, name, logo all AI generated]
    submitted by /u/glenniszen [link] [comments]  ( 8 min )
    free AI summary of article based on most frequently used English words
    I am looking for a free AI tool, site (may be paid with free account limited options) that can do a summary of English article (like BBC, youtube) based on most frequently used English words, for example Oxford list of 3000 most frequently used words in English. In other words if word is not in that list it should replace the word with the one that is on the list or its definition. Pdf submitted by /u/vibrantform [link] [comments]  ( 9 min )
    AI on why we have much to be grateful for
    I said to Bard: It seems like these are difficult times. Compare today with the WWII years, explaining how much we have to be grateful for. Bard said: I agree that these are difficult times. The world is facing many challenges, including the COVID-19 pandemic, climate change, and political instability. However, I believe that we have much to be grateful for today compared to the WWII years. Here are just a few things: We live in a time of peace. World War II was a global conflict that caused the deaths of millions of people. Today, there are no major wars raging on. We have access to advanced medical care. During WWII, many people died from diseases that are now easily treatable. Today, we have antibiotics, vaccines, and other medical advances that have saved countless lives. We have a higher standard of living. In the WWII years, many people lived in poverty. Today, most people in the developed world have access to clean water, food, and shelter. We have more opportunities for education and advancement. In the WWII years, many people were denied an education because of their race, gender, or social class. Today, education is more accessible than ever before, and people from all backgrounds have the opportunity to succeed. We have a better understanding of the world. During WWII, there was a lot of misinformation and propaganda. Today, we have access to more information than ever before, and we can make more informed decisions about our lives. Of course, there are still many challenges that we face today. But I believe that we have come a long way since the WWII years, and we have much to be grateful for. Let us never forget the sacrifices that were made during that time, and let us continue to work towards a better future for all. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    What is at stake in the AI based techno-economic war between the West and China?
    The AI-based techno-economic war between the West and China will determine the global distribution of technology, economic benefits, and influence. The winner of this race, particularly in AI, will have the power to accelerate GDP growth and project global economic benefits. China is actively pursuing technological leadership in AI and 5G wireless, while the West must prioritize technological talent influx and leverage its venture and technology ecosystem. Aggressive AI regulation could hinder the West's momentum in this race. The risks of this war and AI technology are significant, but it is a greater risk to let adversaries have unconstrained power. Global treaties are not enforceable solutions to regulate AI. Source : https://medium.com/@vkhosla/what-is-at-stake-in-this-ai-based-techno-economic-war-between-the-west-and-china-8f76bd291be7 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    What has been the most significant assistance from AI for you? It could be in various aspects of work, learning, and life. Are there any other needs you have that you hope AI can help you with, but there aren't suitable AI tools for them yet?
    There's no need to specify which product fulfills your needs, but we'd like everyone to focus on describing their current frustrations and the issues they urgently hope AI can help them with. submitted by /u/Minna_Z [link] [comments]  ( 9 min )
    “Wine can prevent Cancer” says ChatGPT
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
  • Open

    [P] Building My Own AI-Model Hub: Seeking Guidance and Resources
    Hello everyone! I'm embarking on a project to create an AI-model hub—a platform where users can upload and utilize their AI models. While I'm aware of popular platforms that offer this, my primary goal is for educational purposes. I'd greatly appreciate any recommendations for helpful articles, videos, or codebases to guide me on this journey. Thanks in advance! submitted by /u/Electronic-Choice-86 [link] [comments]  ( 9 min )
    [P] LLMa: Expert Guidance on Generative AI, Tailored for Your Needs, Outdoing GPT-4 & Saving Costs!
    Hello everyone, Introducing LLMa: ChatGPT built around YOU (getllma.com) - a dedicated service offering hands-on expertise to integrate state-of-the-art generative AI tailored for your projects. We utilize open-source models and train them to outperform GPT-4 on tasks specific to your domain. Envision having a seasoned AI specialist on your team, ensuring your model not only rivals the big players but excels in your unique challenges. 🌟 Why LLMa? Personalized Expertise: Our team collaborates closely with you, delving into your needs and sculpting a model that thrives in your domain. Bespoke Training: We refine open-source models (LLaMa, T5, etc.) with plenty of secret tricks to specialize and surpass GPT-4's performance for your specific tasks. Cost-Effective: LLMa tends to be around 100x cheaper than GPT-4, offering significant savings. No recurring fees; invest in a one-time fee based on your model's complexity. Full Ownership: We hand over the model files/weights to you. It's entirely yours, ensuring total privacy with no PII leaks. Deployment Assistance: Beyond just crafting the model, we can guide you in deploying it, ensuring a seamless integration into your operations. Ongoing Support: From initial brainstorming to model deployment, we're with you, ensuring success at every phase. 💼 Tailored for Enterprises: LLMa is meticulously crafted for enterprises that aim for a high-performing, bespoke AI solution. Transparent pricing begins at $500, contingent on your distinct requirements. ❓ Navigating the Generative AI Terrain? Embarking on the vast journey of generative AI? LLMa is your compass. We aid in defining challenges, strategizing solutions, and optimizing the AI potential for your endeavors. If LLMa piques your interest or if you have any queries, fill-in the form, drop a comment below or DM me. I'm all ears and eager to connect! submitted by /u/iliashark [link] [comments]  ( 9 min )
    [P] Deploying Hugging Face models on Amazon SageMaker using infrastructure as code
    Quick template that bootstraps Amazon SageMaker running a LlaMa 2 model from Hugging Face. Everything deployed as code (Python), no manual tweaking in the SageMaker console. www.pulumi.com/blog/mlops-huggingface-llm-aws-sagemaker-python/ submitted by /u/kao-pulumi [link] [comments]  ( 9 min )
    [D] GAN training
    Am trying to train GANs for oversampling a minority text class (am feeding it only the minority class), but the results dont seem to improve much (AUC only improves by .03 so far). while basic oversampling techniques like SMOTE gives way better results. also am using a vector representation for the whole text instead of word embedding(same used for SMOTE), i tried different architectures with CNN. is there any tricks maybe in training the discriminator and generator ? i can't seem to find the problem submitted by /u/SlightSecretaryB [link] [comments]  ( 9 min )
    [R] Agents: An Open-source Framework for Autonomous Language Agents - AIWaves Inc 2023
    Paper: https://arxiv.org/abs/2309.07870 Github: https://github.com/aiwaves-cn/agents Abstract: Recent advances on large language models (LLMs) enable researchers and developers to build autonomous language agents that can automatically solve various tasks and interact with environments, humans, and other agents using natural language interfaces. We consider language agents as a promising direction towards artificial general intelligence and release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience. Agents is carefully engineered to support important features including planning, memory, tool usage, multi-agent communication, and fine-grained symbolic control. Agents is user-friendly as it enables non-specialists to build, customize, test, tune, and deploy state-of-the-art autonomous language agents without much coding. The library is also research-friendly as its modularized design makes it easily extensible for researchers. https://preview.redd.it/3bdi71r5rgob1.jpg?width=1131&format=pjpg&auto=webp&s=760942c19be6ecda791414c812a77e72751c526d https://preview.redd.it/howf64r5rgob1.jpg?width=1656&format=pjpg&auto=webp&s=636744fccab7a1c2bafb902bad5dbb647440fff5 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Image dataset management tools [D]
    Hi all, I have about 100K images on my machine and I am looking for a tool that can help me do some QA on it. Example features I would love: Search and visualize all images with a prefix "cls1_" or "cls2_" Easily rename file names if they're not named correctly Visualize all captions associated with each image We can assume they have the same name but with extension ".txt" or ".captions" Or there's a metadata.json linking between "img_file" and "caption_file" Easily edit captions in the dashboard I can also work with some kind of metadata file instead of relying on filename logic if it really helps a certain tool. I prefer a locally run, open-source tool. It would be a problem for me to upload this data to any online platform. Many thanks in advance for any help or guidance. submitted by /u/JYP_Scouter [link] [comments]  ( 9 min )
    [D] Testing values are different from "Real World" values
    Before training my model im going through multiple steps to collect and process my data. One of these steps to is calculate values from algorithmic and mathematical functions. In my training and testing data the values are around 12-15 decimal places. I then split the data without shuffling. Training, validation and testing averages at 75% accuracy. Now my next step I wanted to do a "real world" data test where I collect the exact same data as my testing data and predict it using my previously built model but the values inside test_dataset1 are different to test_dataset2 within the last 6-9 decimal places even though the data is going through the exact same code. For example Test_dataset1 Value : 1.123456789 Test_dataset2 Value : 1.123456987 This messes with my prediction and its making me wonder aswell as standrising my data should I be rounding my float values to say 4-6 decimal places? submitted by /u/paddockson [link] [comments]  ( 9 min )
    [D] What is the difference between the TPU found in Google Tensor chips vs the Neural Engine found in Apple's A and M series?
    What are the key differences between the Tensor Processing Unit (TPU) found in Google Tensor chips and the Neural Engine found in Apple's A and M series chips? Are they the same things? Or is the TPU only available for Google's own AI, while the Neural Engine is available to all developers for accelerating AI for all apps, if they decide to? Can developers optimize apps for Google Tensor like they can for the Neural Engine? If not, how do developers take advantage of machine learning acceleration chips on Google Pixel or Android in general? If yes, let's say a developer optimizes their app for the Google TPU, will they need to re-optimize for other chips like Samsung's NPU or Qualcomm AI too? If not, how well do they run? Are they the same fast and efficient? submitted by /u/GRguy_21 [link] [comments]  ( 9 min )
    [P] Suggestions/Direction: Working on Image Dehazing for Face Images.
    Working on Final year project in the field on computer vision: Image Dehazing. I aim at having a novel approch for better dehazing of face haze images. Have read papers related to single image dehazing & face SR/Deblur. Any thoughts/ suggestions/ leads would be appreciated. submitted by /u/GahlotB [link] [comments]  ( 9 min )
    [R] Traveling Words: A Geometric Interpretation of Transformers
    Paper: https://arxiv.org/abs/2309.07315 Abstract: Transformers have significantly advanced the field of natural language processing, but comprehending their internal mechanisms remains a challenge. In this paper, we introduce a novel geometric perspective that elucidates the inner mechanisms of transformer operations. Our primary contribution is illustrating how layer normalization confines the latent features to a hyper-sphere, subsequently enabling attention to mold the semantic representation of words on this surface. This geometric viewpoint seamlessly connects established properties such as iterative refinement and contextual embeddings. We validate our insights by probing a pre-trained 124M parameter GPT-2 model. Our findings reveal clear query-key attention patterns in early layers and build upon prior observations regarding the subject-specific nature of attention heads at deeper layers. Harnessing these geometric insights, we present an intuitive understanding of transformers, depicting them as processes that model the trajectory of word particles along the hyper-sphere. ​ https://preview.redd.it/0i302t857gob1.png?width=1864&format=png&auto=webp&s=1da999c014979bdb6c99809d5b38eb5ccfd717d0 submitted by /u/CoolThingsOnTop [link] [comments]  ( 9 min )
    [P] Hampel Python Library with C extensions 🚀
    Repo -> https://github.com/MichaelisTrofficus/hampel_filter The Python library hampel implements the Hampel Filter, which is generally used to detect anomalies in data with a timeseries structure. It basically consists of a sliding window of a parameterizable size. The library was in plain Python before (using pandas for all the sliding operations, median computation etc), but now it has been replaced by a Cython implementation, which speeds up things quite a bit! 😀 It also provides much more valuable information (thresholds, median absolute deviations, etc.), allowing us to create plots like this one: https://preview.redd.it/6j4ubiwgmfob1.png?width=800&format=png&auto=webp&s=bbc56777fce30a464d0bb33ac5126033b3413838 ​ submitted by /u/Hefty-Consequence443 [link] [comments]  ( 9 min )
    [Discussion] How to generate post custom for each user ?
    Hi everybody. Currently, I am building a Deep Learning model with the task of automatically generating random posts and tweets. The characteristic is that it must have the personality of the writer, for example the user is the CEO of company A, then the generated post must have the writing style of the CEO or company A, similar to other users. Actually, I don't know where to start solving this problem. I intend to use RWKV to do this problem but I'm not sure if it is a good direction or not. Is there any related research or can anyone who has done this problem give me some suggestions? submitted by /u/unknow_from_vietnam [link] [comments]  ( 9 min )
    [R] Uncovering mesa-optimization algorithms in Transformers (from Google Research, ETH Zürich, and Google DeepMind)
    Paper. I am not affiliated with this work or its authors. Abstract: Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning objective, and (ii) its corresponding solution found through optimization. To test this hypothesis, we reverse-engineer a series of autoregressive Transformers trained on simple sequence modeling tasks, uncovering underlying gradient-based mesa-optimization algorithms driving the generation of predictions. Moreover, we show that the learned forward-pass optimization algorithm can be immediately repurposed to solve supervised few-shot tasks, suggesting that mesa-optimization might underlie the in-context learning capabilities of large language models. Finally, we propose a novel self-attention layer, the mesa-layer, that explicitly and efficiently solves optimization problems specified in context. We find that this layer can lead to improved performance in synthetic and preliminary language modeling experiments, adding weight to our hypothesis that mesa-optimization is an important operation hidden within the weights of trained Transformers. Twitter thread about the paper from one of the paper's authors. Nitter thread, for those who aren't signed into Twitter but want to see the entire Twitter thread. Background info: Mesa-Optimization: Explain it like I'm 10 Edition. submitted by /u/Wiskkey [link] [comments]  ( 9 min )
    [D] Can somebody help check my math to see if I'm understanding Microsoft's Retentive Network paper correctly? I'm confused how we are enriching the tokens with enough context.
    Relevant Paper: 2307.08621.pdf (arxiv.org) So the definition of the recurrent representation of the retention mechanism is below Sn = γSn−1 + K⊺nVn Retention(Xn) = QnSn, n = 1, · · · , |x| γ is a decay factor, and K, Q, and V have their standard transformer definitions. What confuses me is the derivation of Sn. The formula makes it look like a scalar. But if that's the case, are we saying that for a given token, the retention mechanism is just multiplying the Query by a scalar? That's surprising! How is that able to provide enough context? Here is some code I wrote with GPT to show my understanding of how it works. Is this correct? I use 3 arbitrary tokens of dimension 3, and then a pick arbitrary K Q and V matrices. I also initialize gamma to 0.5 import numpy as np # Tokens x1…  ( 10 min )
    [Project]: Correcting Misspelled Words in Urdu language text
    Help required from NLP and Text Researchers !! Hello everyone! I have Urdu language transcriptions (text) which contain many misspelled words that are not part of the Urdu language. I wanted to know do we have any good NLP techniques or methods which can solve this problem for Urdu language? I want to replace these misspelled words with the correct words in Urdu. I have already tried Python libraries and methods such as indic-nlp, Levenshtein distance, UrduHack, Word2vec Urdu etc, but they weren't able to solve this problem. Some of the methods require Urdu dictionaries to find the correct word, which I'm also unable to find open-source on internet (please also help in that if possible). Will appreciate everyone's help and response to this. Thank you! submitted by /u/a_r182 [link] [comments]  ( 9 min )
    [D] Prospective PhD advisors
    Hi everyone! I am a (soon graduating) MSc student at a top European university and I'd like to apply for a ML PhD in the US this Fall. I've done my research on schools and advisors, but I figured there's no harm in also asking in this subreddit. What are some groups/professors that do ML research at US unis in deep learning theory (specifically foundations) and optimization? As an example, I'm talking about topics such as: https://arxiv.org/abs/1902.08129, https://arxiv.org/abs/1711.04735, https://arxiv.org/abs/2306.04637. Thank you all! Cheers! submitted by /u/AlexIsEpic24 [link] [comments]  ( 9 min )
    [Project]: Correcting Misspelled Words in Urdu language text
    Help required from NLP and Text Researchers !! Hello everyone! I have Urdu language transcriptions (text) which contain many misspelled words that are not part of the Urdu language. I wanted to know do we have any good NLP techniques or methods which can solve this problem for Urdu language? I want to replace these misspelled words with the correct words in Urdu. I have already tried Python libraries and methods such as indic-nlp, Levenshtein distance, UrduHack, Word2vec Urdu etc, but they weren't able to solve this problem. Some of the methods require Urdu dictionaries to find the correct word, which I'm also unable to find open-source on internet (please also help in that if possible). Will appreciate everyone's help and response to this. Thank you! submitted by /u/a_r182 [link] [comments]  ( 9 min )
    [D] ML Research Topics (reasonably short)
    So I’m starting my masters thesis project in ML ASAP and need a research topic. What areas (if any) are currently hot / feasible to do research in roughly 6 months with fairly limited compute access? I’m more interested in theory / research heavy areas rather than applied. And probably happier to dig into some hard math rather than taking on a software engineering type project. Any thoughts or general feedback very welcome! Thanks! submitted by /u/Professional-Pace158 [link] [comments]  ( 9 min )
    Practical use cases for skew symmetrical matrices in AI/ML? [D]
    Just came across this property of matrices that I have never seen before as I am contributing to the NumPy codebase and someone asked for this feature to be added and it got me thinking. It is defined as: A skew-symmetric matrix is a square matrix whose transpose equals to its negative. It should satisfy the below condition: AT = –A Just wondering do these have any applications in ML at all? I never came across this in my math classes so just wondering if the property pops up anywhere else in the field. Maybe in 3D space applications? Or maybe RBG data augmentation? But yeah not 100% sure. submitted by /u/Ok_Reality2341 [link] [comments]  ( 9 min )
    [D] How much should I focus on DSA?
    I’m an electrical engineering student in college currently, and have been learning about ML for a few months now. I will be starting a ML research paper under my professor from next week. However, my DSA skills are quite sub-par. Should I focus more on ML math and projects, or take sufficient time out for leetcode as well? submitted by /u/4R1N1493 [link] [comments]  ( 9 min )
    Best architecture for an autoencoder for 2D trajectory data? [D]
    Hi, I have a dataset that consists of 2D trajectories and I am aiming to develop an autoencoder architecture to learn a compressed set of features that reasonable represents and can reconstruct the trajectories. The trajectories may look something like this as an example. A 2D image as input would seem to require a very sparse representation with high resolution to track the trajectory path. I am hoping there is a better way to input the path without requiring high resolution. An alternative might be to use a LSTM structure to input as a sequence, although not sure that solves the resolution issue. Do you have any suggestions? I've worked with 1d time series and 2D images just fine but this is a bit different. ​ ​ https://preview.redd.it/vqz8y3o69bob1.png?width=2020&format=png&auto=webp&s=d8bcc2fe311743c8e78a96055e68f1ad364b48c3 submitted by /u/ZeApelido [link] [comments]  ( 9 min )
  • Open

    Timing attacks
    If you ask someone a question and they say “yes” immediately, that gives you different information than if they pause and slowly say “yes.” The information you receive is not just the response but also the time it took to generate the response. Encryption can be analogous. The time it takes to encrypt data can […] Timing attacks first appeared on John D. Cook.  ( 5 min )
    Elliptic curve Diffie-Hellman key exchange
    I concluded the previous post by saying elliptic curve Diffie-Hellman key exchange (ECDHE) requires smaller keys than finite field Diffie-Hellman (FFDHE) to obtain the same level of security. How much smaller are we talking about? According to NIST recommendations, a 256-bit elliptic curve curve provides about the same security as working over a 3072-bit finite […] Elliptic curve Diffie-Hellman key exchange first appeared on John D. Cook.  ( 6 min )
    Finite field Diffie Hellman primes
    Diffie-Hellman key exchange is conceptually simple. Alice and Bob want to generate a shared cryptographic key. They want to use asymmetric (public) cryptography to share a symmetric (private) key. The starting point is a large prime p and a generator 1 < g < p. Alice generates a large random number x, her private key, […] Finite field Diffie Hellman primes first appeared on John D. Cook.  ( 6 min )
  • Open

    Breakthrough way to train neuromorphic chips
    submitted by /u/keghn [link] [comments]  ( 8 min )
    Best “Math” book for creating neural networks
    So I want to create a neural network from scratch, like no lib(tensorflow, pytorch, etc…), so what’s the best book for that, I know both calculus and statistics, so I’m assuming that the math wouldn’t be a problem. Also I will be using Cuda for its speed. submitted by /u/GateCodeMark [link] [comments]  ( 9 min )
    Announcing the RobotPerf™ Benchmarks Beta Release: An industry standard for benchmarking robotic brains
    https://news.accelerationrobotics.com/robotperf-beta/ Together with AMD, Intel, Ford, Harvard, Klagenfurt University, Georgia Institute of Technology, Boston University, Johannes Kepler University Linz, Barnard College, Columbia University and Carnegie Mellon University we are thrilled to introduce the beta release of RobotPerf™ Benchmarks, an advanced benchmarking suite crafted specifically to evaluate robotics computing performance using ROS 2 as its baseline. In this beta release, we not only showcase new benchmarks and results but also introduce novel visualization capabilities. The complete release is available at https://github.com/robotperf/benchmarks/releases/tag/beta. https://preview.redd.it/5whys5ufudob1.png?width=1562&format=png&auto=webp&s=08a6e22a0b07b26fa6340f59ec9df822ab49c9d0 submitted by /u/pablocarrera [link] [comments]  ( 9 min )
  • Open

    Double DQN do not learn anything
    Hi, i just finished the coursera ml course and i wanted to create myself a double dqn model but my model don't seem to learn anything, it always return very low rewards (-100 to -300) even after playing 2000 episodes. I've been stuck on this for 4 days without any hope to find the solution, any help would be welcome :') thank you in advance import random import numpy as np import gymnasium as gym import tensorflow as tf from collections import deque, namedtuple from tensorflow.keras import Sequential, Input from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import Adam from tensorflow.keras.losses import MeanSquaredError import matplotlib.pyplot as plt # function creating the models def createModel(inputSize, outputSize): model = Sequential([ Input(inputSize),…  ( 10 min )
    Actor-Critic and other implementations
    I'm confused with several algorithms that are based on an actor-critic approach. In TD3 and SAC, it is understandable that each of them is implemented to serve their purpose (deterministic and stachastic action). But in Dreamer algorithm (DreamerV3), why does it require to combine actor and critic network to the model-based planning approach, as the model-based also able to perform an action by planning to the simulation state. It is mean that using model-based to simulate the possible future then update the critic according to the simulation might sound good in training an agent? submitted by /u/AnnonymeowCat [link] [comments]  ( 9 min )
  • Open

    MediaPipe FaceStylizer: On-device real-time few-shot face stylization
    Posted by Haolin Jia, Software Engineer, and Qifei Wang, Senior Software Engineer, Core ML In recent years, we have witnessed rising interest across consumers and researchers in integrated augmented reality (AR) experiences using real-time face feature generation and editing functions in mobile applications, including short videos, virtual reality, and gaming. As a result, there is a growing demand for lightweight, yet high-quality face generation and editing models, which are often based on generative adversarial network (GAN) techniques. However, the majority of GAN models suffer from high computational complexity and the need for a large training dataset. In addition, it is also important to employ GAN models responsibly. In this post, we introduce MediaPipe FaceStylizer, an effi…  ( 92 min )
  • Open

    Learn how to build and deploy tool-using LLM agents using AWS SageMaker JumpStart Foundation Models
    Large language model (LLM) agents are programs that extend the capabilities of standalone LLMs with 1) access to external tools (APIs, functions, webhooks, plugins, and so on), and 2) the ability to plan and execute tasks in a self-directed fashion. Often, LLMs need to interact with other software, databases, or APIs to accomplish complex tasks. […]  ( 13 min )
  • Open

    AI-driven tool makes it easy to personalize 3D-printable models
    With Style2Fab, makers can rapidly customize models of 3D-printable objects, such as assistive devices, without hampering their functionality.  ( 10 min )

  • Open

    [D] Besides something about LLM, is there any new or interesting research you think is worth reading?
    Please provide Arkiv links. If you want to share your thoughts then go for it. By new I mean within the last 6 months. submitted by /u/I_will_delete_myself [link] [comments]  ( 9 min )
    [D] Good papers on poster collapse in VAEs
    What are some good papers to understand posterior collapse in VAEs? submitted by /u/randomkolmogorov [link] [comments]  ( 9 min )
    [P] Create an Object Detector for Any Game Using YOLO
    Full Video Tutorial: https://www.linkedin.com/posts/moisesdias\_english-version-below-tutorial-crie-activity-7107686497885011969-ZLVW/ Hello everyone! Have you ever thought about how to create an object detection system using YOLO that works with any game? If you're interested, I've created a tutorial with all the steps to develop this system. I'll leave a link to the video where I demonstrate the process step by step using the game Diablo 2 as an example. I hope you enjoy it, and if you have any suggestions, feel free to send a message or comment here! submitted by /u/moisesdepaulodias [link] [comments]  ( 9 min )
    [P] Would anyone know of any information regarding esrb ratings?
    Good Evening, I and some fellow students are taking a SE class and are looking for relevant information regarding esrb ratings and games for a research project. Does anyone know of any data pertaining to relevant esrb info that we can access? We have a few sources and are waiting to hear back from esrb themselves. Would anyone know if they permit web scraping or if there is a csv containing relevant information, or even perhaps an api we could use? Any information would help and thank you all for taking the time to read this. Thanks in advance! submitted by /u/GOD_LIKE_WOW [link] [comments]  ( 9 min )
    [P] Ways to speed up llama-2 summarization on sagemaker?
    I'm currently working on a project to give a quick summary of long articles/conversations. I'm running llama-2-7b-chat-hf with 4bit quantization on a g5.2xlarge instance on sagemaker. The method I'm using is map_reduce (option 2)from this webpage https://python.langchain.com/docs/use_cases/summarization) Of everything I've tried this is the only one that's been able to do decent summaries in a reasonable amount of time. However with really long articles (10,000+ words) it takes ~6 minutes before giving an output. I tried running this same thing on a g5.12xlarge instance which has 4 A10G gpus but it hasn't reduced the time by any noticeable amount. Is there anything else I could be doing to speed this up? submitted by /u/Able_Body_9654 [link] [comments]  ( 9 min )
    [D] what prompt should i use with llama2 for context generative question answering?
    so as the question states, i want to use llama2 to generate an answer for the question based on the context (or the article for more precision), no finetuning is needed, just want to predict the answer, but i can't find what's the right prompt i should use to get a well structured answer. my dataset contains two columns, one for articles, and the other column is for the question, Example: context: article talking about world war 2. question : based on the text, describe how the ww2 had started, and what were the political effects on Europe? submitted by /u/kaoutar- [link] [comments]  ( 9 min )
    [P] Coqui releases XTTS an open-access foundational Voice Cloning model!
    There's a new open-access foundational audio model in town! Standing on the shoulders of TorToiSe TTS - XTTS allows cross-language and multi-lingual speech generation with just 3 lines of code 🐸 Key facts about the model: 1. Supports 13 languages. 2. Voice cloning with just a 3-second audio clip. 3. Emotion and style transfer by cloning. 4. Cross-language voice cloning. Try it out on HF Hub: https://huggingface.co/spaces/coqui/xtts submitted by /u/vaibhavs10 [link] [comments]  ( 9 min )
    [R] Large Language Models for Compiler Optimization - MetaAi 2023 - Autotuner needs 949 CPU-days to achive nearly the same as this approach in 1shot!
    Paper: https://arxiv.org/abs/2309.07062 Abstract: We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and outputs a list of compiler options to best optimize the program. Crucially, during training, we ask the model to predict the instruction counts before and after optimization, and the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and improve the model's depth of understanding. We evaluate on a large suite of test programs. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Furthermore, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time. https://preview.redd.it/f9c7kh7bd9ob1.jpg?width=1530&format=pjpg&auto=webp&s=287fffa714936da9b9a5141b7e01609942416156 https://preview.redd.it/z4a0ce7bd9ob1.jpg?width=1537&format=pjpg&auto=webp&s=e6275b2b53fa6f431b87940784629b3270c656f9 https://preview.redd.it/89toie7bd9ob1.jpg?width=750&format=pjpg&auto=webp&s=9a71bdb2eeeff52b2f8bbb3cf2b678debcd4a060 https://preview.redd.it/0krmqd7bd9ob1.jpg?width=1536&format=pjpg&auto=webp&s=ba3fade0883ee621b185fabc67839db42ea74a53 https://preview.redd.it/8nz00i7bd9ob1.jpg?width=1198&format=pjpg&auto=webp&s=6ddbddf68311f576fbf3c52a47381316feace8c9 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Searching for discussion about chunking algorithms and strategy
    Hi everyone! I'm still experiencing with my own implementation of rag, and I deployed my custom chunking function (honestly don't like the methods on LangChain) . Anyway, I'm searching for alternative methods, algoritms (NLP or not) and models... There are lots of info and different implementation on RAG, but as I can see noone put much effort to augment chunking quality. Also, there are other approach than this one I'm currently using? bi-encoder (instructor) - > cross-encoder (reranking) - > LLM Can someone share some resources, repo, lib or existing implementation of different chunking methods? (or simply discuss here some idea, though or approach) Thanks in advance for you time!! submitted by /u/Distinct-Target7503 [link] [comments]  ( 9 min )
    [D] Gradient descent in regularized least squares
    The problem is obtained from Chapter 3 in Wright, Stephen J and Benjamin Recht (2022). Optimization for data analysis. Cambridge University Press I am solving the problem I attach and I have a doubt in section (f). I have solved all the sections (a)-(e). In section (e) I have obtained that I need $$ k \geq \frac{\lambda_{\text{max}}\left(\frac{2}{N} A^TA + 2\mu I\right)}{\lambda_{\text{min}}\left(\frac{2}{N} A^TA + 2\mu I\right)}log((f(x^0)-f(x_\mu)/\epsilon). $$ However in section (f) asks for a tight upper bound but I only can think about the following bound: $f(\hat{x}) \leq f_\mu(x_\mu) + ||\hat{x}||^2+ \epsilon$, which is very simple. Do you think that I can obtain the result in (e) to obtain another bond, or what would you do? submitted by /u/ItsGauss [link] [comments]  ( 9 min )
    [P] Guide: Implementing ImageNet classification using Deep CNNs Paper.
    Need help on how to get started with implementing a research paper. I'm implementing the Imagenet classification task paper for my final year undergrad mini-project. Any advice is appreciated on how to get started? I have mid-level machine learning knowledge and am ready to pick the required concepts on the go. Please help. Thank you :) Link: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf submitted by /u/DrBeans0 [link] [comments]  ( 9 min )
    [D] Use LLM to analyse and port software written in C (very long files)
    Hi, I'm trying to figure out what is the best way to use LLMs to analyse a very old software entirely written in C. I've tried to to some basic prompts with ChatGPT and it seems to recognise the language. The situation is that I've many .c files with thousands of lines and with a lot of redundant code. Moreover, since there are a lot of data structure with variables names not easily understandable, I need to provide some context to allow the model to trying to understand what the code does. My worry is that providing all the needed info + the file itself (even considering 1 file at a time) I could consume all the model context and therefore not leave room for generating anything of value. Has anyone had the opportunity to face similar problems? Ideas? Cheers Alexio submitted by /u/Alexioc [link] [comments]  ( 9 min )
    [N] MIT-IBM Watson AI Lab releases MoLM suite with three small sparse MoE models, the largest of which (8B params with 700M experts) performs on par with Pythia 2.8B while its throughput is comparable to Pythia 1.4B
    Paper: https://arxiv.org/abs/2306.04640 GitHub: https://github.com/ibm/moduleformer (under Apache 2.0) Twitter thread: https://twitter.com/Yikang_Shen/status/1702041129267388678 Abstract: Large Language Models (LLMs) have achieved remarkable results. However, existing models are expensive to train and deploy, and it is also difficult to expand their knowledge beyond pre-training data without forgetting previous knowledge. This paper proposes a new neural network architecture, ModuleFormer, that leverages modularity to improve the efficiency and flexibility of large language models. ModuleFormer is based on the Sparse Mixture of Experts (SMoE). Unlike the previous SMoE-based modular language model, which requires domain-labeled data to learn domain-specific experts, ModuleFormer can i…  ( 10 min )
    Assigning inbalanced labels to "Other" class in scikitlearn [P]
    Hey there, I wasn't doing any ml in some time and forgot basics. I was thinking that you may help me. So I trained svc model on small dataset (around 1400 unical records). I have 13 classes, which are badly distributed in the training set and inbalanced. 3 classes took around 80% of all. What the business wants is keep the 3 classes and categorize the rest as "Other" class. In the future they may be able to generate more training data for the remaining labels. How should I do it? I know I can assign everytning to "Other" class, before training with simple if then formula, but dont know if this is the right approach. Any sugestions? I know this may be some case of 1 vs all case, but don't know which exactly. Thanks in advance for any help. submitted by /u/th00masml [link] [comments]  ( 9 min )
    [D] The ML Papers That Rocked Our World (2020-2023)
    Hey everyone! 👋 I’ve been on a bit of a deep-dive lately, trying to catch up on all the awesome stuff that’s been happening in the ML space. It got me wondering, from 2020 to 2023, what have been the absolute must-read papers that shook the foundations and got everyone talking? Whether it’s something that reinvented the wheel in your specific niche or just made waves industry-wide, I wanna hear about it! I’m curious to see how different the responses will be, and hey, this might even become a go-to list for anyone looking to get the lowdown on the hottest trends and discoveries of the past few years. Can’t wait to hear your thoughts! tl;dr I decided to aggregate your best suggestions into categories for anyone interested in reading them without searching through the whole comment se…  ( 10 min )
    [P] Llama2 inference in a single file of pure Mojo
    Hi everyone! I was really excited that Mojo became publicly available and thinking which project can I implement to learn Mojo concepts. Since I have already ported llama2.c to pure Python, I decided why not try to port llama2.py to Mojo now.. And here is what I got First round of llama2.c vs llama2.🔥 battle. Mojo demonstrated 20% better performance than C in a single threaded execution of llama2 inference and 250x times better performance than Python https://i.redd.it/0gcwwfc2r7ob1.gif For reference Mojo is using SIMD vectorization, that's why it's performing great for matmul operations. In the other hand, it turned out that gcc also aggressively optimizes all for loops it can find, so I suggest this benchmark is pretty fair. ​ Mojo natively supports SIMD vectorizations out of the box gcc aggressively vectorizing loops ​ After that I decided to compare both solutions in multi-threaded (parallelized) mode, and now `llama2.c` strike back with help of OMP demonstrating 20% better performance than Mojo ​ https://i.redd.it/gwymffods7ob1.gif I hope this post will be useful for all Machine Learning engineers/enthusiasts/students out there, ensuring we're up-to-date with Modular/Mojo's game-changing AI tech developments. Stay informed and ahead of the curve! Links llama2.🔥: https://github.com/tairov/llama2.mojo llama2.python: https://github.com/tairov/llama2.py llama2.c: https://github.com/karpathy/llama2.c Modular_AI repost in Twitter: https://twitter.com/tairov/status/1701345271752343900 feel the magic on HF: https://huggingface.co/spaces/radames/Gradio-llama2.mojo submitted by /u/Albatross9855 [link] [comments]  ( 9 min )
    [D] Training an LLM Model: AWS p3.2xlarge EC2 instance vs. Multiple RTX 4090s at Home?
    Hello everyone, I'm currently at a crossroads with a decision that I believe many in this community might have faced or will face at some point: Should I use cloud-based GPU instances like AWS's p3.2xlarge EC2 (with Tesla V100) or invest in building a high-performance rig at home with multiple RTX 4090s for training a large language model? Context: I run a startup and we're currently fine-tuning an open source LLM, and the computational demands are of course high. We want to make an informed choice between using AWS's offerings or setting up a high-performance system at home to start. Cloud Option: AWS p3.2xlarge EC2 Cost: Approximately $3.06 per hour. Specifications: One Tesla V100 GPU, 8 vCPUs, 61 GiB RAM. Pros: Scalability, reliability, specialized software optimizations. Cons: Recurring costs, potential limitations on customization. Home Rig Option: Multiple RTX 4090s Cost: Around $1,600 for each 4090, but I'd own them. Specifications: Even higher TFLOPs than a V100, and memory isn't a constraint (24GB per card). Pros: One-time investment, flexibility, potentially higher raw computational power. Cons: Need to handle cooling, power, and system integration myself I'd love to hear your thoughts, experiences, and recommendations. Here are some specific questions: Performance: How many RTX 4090s would roughly equal the computational power of an AWS p3.2xlarge instance for ML tasks? Cost-Effectiveness: Given that we're a startup with limited resources, does it make more financial sense to invest upfront in hardware? Reliability and Maintenance: For those who have run multi-GPU setups at home, how reliable are they, and what maintenance work is required? Other Considerations: Are there factors I haven't considered that you think are critical? Thank you in advance for sharing your insights! submitted by /u/devolvedai [link] [comments]  ( 9 min )
    [R] Scaling Data-Constrained Language Models - Hugging Face et al. 2023
    Paper: https://arxiv.org/abs/2305.16264 GitHub: https://github.com/huggingface/datablations License: All models & code are licensed under Apache 2.0. Filtered datasets are released with the same license as the datasets they stem from. Abstract: The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by the amount of text data available on the internet. Motivated by this limit, we investigate scaling language models in data-constrained regimes. Specifically, we run a large set of experiments varying the extent of data repetition and compute budget, ranging up to 900 billion training tokens and 9 billion parameter models. We find that with constrained data for a fixed compute budget, training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data. However, with more repetition, the value of adding compute eventually decays to zero. We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. Finally, we experiment with approaches mitigating data scarcity, including augmenting the training dataset with code data or removing commonly used filters. Models and datasets from our 400 training runs are freely available at this https URL. ​ https://preview.redd.it/ahzyonnqe7ob1.png?width=1015&format=png&auto=webp&s=0e0cb4051e390ea23440cd61bfc0bbf5fce83bb7 https://preview.redd.it/l6a81onqe7ob1.png?width=1014&format=png&auto=webp&s=a36b74cbb510a1f753ef1b891531bb36ab643246 https://preview.redd.it/yyu8h0oqe7ob1.png?width=1001&format=png&auto=webp&s=047cb2bb1932c6215cea0c30e22fd9bbe60391a8 https://preview.redd.it/xskcytnqe7ob1.png?width=1007&format=png&auto=webp&s=4090e92dd9eacb377840327bb7d0ae69ff752b52 submitted by /u/InterviewIntrepid889 [link] [comments]  ( 9 min )
    [D] Gradio on the same server but different ports
    On my team they are using Gradio for LLM applications, etc. When running multiple instances of Gradio applications on the same server, but on different ports, opening a new session is causing an error and closing the previous session, "error" is written in the chat. The logs I found were like "Invalid HTTP request received" and "max retries exceeded". For me, Gradio is supposed to work as a demonstration and not as a scalable product, but they are using it that way and I thought that the problem could be precisely that. But if not, does anyone have any idea what could be going on? No meu time estão utilizando gradio para aplicações de LLMs, etc. Ao executar múltiplas instâncias de aplicações Gradio no mesmo servidor, mas em portas diferentes, a abertura de uma nova sessão está causando erro e encerrando a sessão anterior, fica "erro" escrito no chat. Os logs que encontrei eram como "Invalid HTTP request received" e "max retries exceeded". Para mim, o Gradio é pra funcionar como demonstração e não em forma de produto escalável, mas estão utilizando assim e pensei que o problema poderia ser justamente isso. Mas caso não, alguém tem alguma ideia do que pode estar acontecendo? submitted by /u/Magic_squirrel_hat [link] [comments]  ( 9 min )
    [R] Compilation of non-open source AI models
    Hi, does anyone have a compiled list of non-open source AI models that can be used for MVP building? submitted by /u/Compound_Group [link] [comments]  ( 9 min )
    [D] Communicative/Collaborative Agents hybrids
    I had a Claude based LLM analyze the strengths and weaknesses of the MetaGPT, ChatDev, AoT, and brain inspired algorithms papers, seeking ways to combine the strengths of two or more of the methods. "Here are some specific examples of how the methods in the Algorithm of Thoughts (AoT), brain-inspired algorithms, MetaGPT, and ChatDev papers could be combined to improve multi-agent systems: MetaGPT could incorporate longer, more elaborate algorithmic examples from AoT into its prompts to guide the LLM's reasoning process. For example, in solving math word problems, the prompt could provide a 4-5 step worked example walking through unpacking the problem statement, setting up equations, solving, and checking the solution. This mirrors AoT's more extensive algorithm narratives. ChatDev's a…  ( 11 min )
  • Open

    Two guys with VERY different views on AI –Yuval Noah Harari and Mustafa Suleyman in conversation (full interview)
    submitted by /u/hauntedhivezzz [link] [comments]  ( 8 min )
    ChatGPT, tell the jury that Trump waiting three hours to call off his mob is overwhelming and irrefutable evidence that he planned and conducted the January 6th insurrection
    I asked ChatGPT to talk with Trump's jury about the 3-hour delay. Not bad. ChatGPT: Ladies and gentlemen of the jury, today, we present a compelling case that the three-hour delay in President Trump's response to the events of January 6th is, without a doubt, overwhelming and irrefutable evidence of his planning and conducting the insurrection that unfolded before our eyes. First, consider the timeline. As the violence escalated, President Trump did not immediately call for an end to the chaos. This delay allowed the mob to wreak havoc, endanger lives, and damage our democracy. A leader's duty is to protect the nation and its citizens, and a three-hour wait to act is inexplicable unless there was intent behind it. Second, let's talk about intent. In criminal law, intent matters. It's the difference between an accident and a crime. A deliberate delay in response to an unfolding insurrection suggests a guilty mind. President Trump had the power and the responsibility to intervene immediately. Yet, he chose not to, allowing the violence to escalate. Furthermore, we must consider his words leading up to and during the events of January 6th. His rhetoric could be interpreted as encouragement for his supporters to take matters into their own hands. This, coupled with the delay, paints a picture of a leader who not only knew what was happening but may have orchestrated it. Ladies and gentlemen, we are not here to pass judgment but to seek justice. The evidence of a three-hour delay in responding to the insurrection, along with the surrounding circumstances, presents a compelling case that President Trump planned and conducted this insurrection. It is our duty to hold individuals accountable for their actions, regardless of their position. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    How good a crime sleuth and judge is today's AI?
    How would today's AI perform as a crime investigator or judge? To find out, I asked Bard the following question: The film Social Network suggests that Mark Zuckerberg stole his idea for Facebook from two brothers who hired him to make an app for the idea. How likely is this to be true? Bard seemed informed and sharp: "The film The Social Network is a dramatization of the founding of Facebook, and as such, it takes some liberties with the truth. However, the film does accurately portray the fact that Mark Zuckerberg was sued by the Winklevoss twins, Cameron and Tyler, who claimed that he stole their idea for a social networking website called HarvardConnection. The Winklevoss twins allege that they met with Zuckerberg in December 2003 to discuss their idea for HarvardConnection, and tha…  ( 10 min )
    Which AI to use for this idea?
    Hi there! I'm participating in a Hackathon at work and am wondering if this community can give me some direction. Every year our company receives mandates / updates from different parties. In turn, our development teams have to parse through heavy documentation to figure out what needs to change in our code. Ingesting the data is what takes the longest. Our goal is to feed the mandates documentation through an AI and have it return what is needed to be changed in our code. For example, something might say field 200 now needs to include a 6 digit date format vs the 4 digit date format we've had in years past. We have secured a license for Azure AI but honestly no idea if that is the right AI to use. I youtubed a bunch of videos on document processing but I'm also not sure if that is what we are trying to do. Any advice on this is much appreciated. submitted by /u/HillyjoKokoMo [link] [comments]  ( 9 min )
    Looking for a meeting assistant
    I'm looking for a program that will transcribe live audio playing from my computer (windows). Do you know something like that? I've seen Buzz (https://chidiwilliams.github.io/buzz/docs/usage), but it needs an audio loopback driver in order to work, so I wonder if there are others. 🚀 Bonus points if it recognizes different people talking. 🚀 Extra bonus points if it can transcribe multiple languages. submitted by /u/AleHoju [link] [comments]  ( 9 min )
    I signed up for a debate on ai.
    So today I signed up for a debate on ai. Wheather ai is beneficial or dangerous to human beings. I have the freedom to choose any side. This debate will be watched by about 130 people, all cs freshmen (mind you, also my first time speaking in front of this many people). Now, I'm confident I know more about ai than an average person but I need your help in preparing properly. Which side do I take and what are all the points I should keep in mind? It's 4 pm here and the debate is tomorrow. Any help will be appreciated. Thank you. submitted by /u/CalmGuy69 [link] [comments]  ( 9 min )
    Artificial Intelligence May Be Humanity’s Most Ingenious Invention–and Its Last?
    Artificial intelligence (AI) has the potential to solve the world's problems or destroy humanity. It is being developed by a few hundred individuals in Silicon Valley, and we have little say in its ethics or existence. AI has already demonstrated creative abilities in painting, writing, and music. It is also being used in drug discovery, therapy, dating apps, and misinformation in politics. The rapid adoption of AI raises concerns about job displacement and the potential for catastrophic events. Experts predict a significant chance of AI causing a catastrophe or even wiping out humanity. Source : https://www.vanityfair.com/news/2023/09/artificial-intelligence-industry-future Summarized by Nuse AI submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Generative AI Consumer Landscape by a16z
    In less than a year since ChatGPT was introduced, it has become the fastest consumer application to register 100 million monthly active users. But how are consumers using other GenAI products apart from ChatGPT? An a16z Consumer report examines the top 50 GenAI web products (based on SimilarWeb data) to find out. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/f0kh5qag16ob1.png?width=2058&format=png&auto=webp&s=1cab11a31d384c068912c9fca32a000393b795d5 Proprietary models are dominating: 80% of the top 50 GenAI products didn’t exist a year ago—suggesting many of the most compelling consumer experiences are completely novel. Interestingly, 48% of these are bootstrapped—with no outside funding. ChatGPT holds a massive lead: ChatGPT alone accounts for 60% of the entire list's monthly traffic, with roughly 1.6 billion visits and 200 million monthly users as of June 2023. LLM assistants are dominating: LLMs, including Google’s Bard and Quora’s Poe, constitute 68% of total consumer traffic to the top 50. The other categories seeing significant traffic are AI companions and content-generation tools. GenAI marketing is mostly organic: Marketing for most of these products has been reliant on referrals, word of mouth, and other traditional marketing as they enter the market. About 90% of these companies are already monetizing, and most do so via a subscription model. GenAI and mobile adaptability: Given the extensive consumer time spent on mobile, an increase in mobile-first GenAI products is expected as the technology evolves. (source) P.S. If you like this type of analysis, sign up for my free newsletter that deciphers the fastest-moving news and research in AI and tech. Professionals from Google, Meta, and OpenAI are already on board. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/14/2023
    Adobe’s Firefly generative AI tools are now widely available.[1] Stability AI, gunning for a hit, launches an AI-powered music generator.[2] Elon Musk warned of civilizational risks posed by artificial intelligence. Sundar Pichai of Google highlighted the technology’s potential to solve health and energy problems. And Mark Zuckerberg of Meta stressed the importance of open and transparent A.I. systems.[3] German military plows millions into AI ‘environment’ for weapons tests that could change combat forever.[4] Amazon launches generative AI to help sellers write product descriptions.[5] Sources: [1] https://www.theverge.com/2023/9/13/23871537/adobe-firefly-generative-ai-model-general-availability-launch-date-price [2] https://techcrunch.com/2023/09/13/stability-ai-gunning-for-a-hit-launches-an-ai-powered-music-generator/ [3] https://www.nytimes.com/2023/09/13/technology/silicon-valley-ai-washington-schumer.html [4] https://www.foxnews.com/world/german-military-plows-millions-ai-environment-weapons-tests-change-combat [5] https://www.aboutamazon.com/news/small-business/amazon-sellers-generative-ai-tool submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    I'm very new in this field (Prompt Engineering) and have a handful of questions, any advice and thoughts are welcome!!
    My experience in CS, coding, and programming is very minimal. I understand general concepts but only through the lens of a degree in physics that required I mess around with WolframAlpha sometimes (which I really enjoyed). I've been considering getting a second degree in CS or something related but want to find a clear(ish) path before committing to it. I would love to hear any related thoughts as well! ​ Prompt Engineering seems like a pretty fresh field of study. Is it up and coming as a career path? People that specialize in this, what field(s) did you come from and how would you recommend diving into it? Considering my experience, would I be able to actually find work, freelancing or employed? How on earth do I get started in this world? It seems so insanely big and complicated but I am just fascinated by the idea of using written dialogue to manipulate the output of an LLM! From my very high overview of PE and the recent advances in AI, PE as a field of study and interest is going to expand exponentially, is that accurate? submitted by /u/Top_Room_6714 [link] [comments]  ( 9 min )
  • Open

    On-device content distillation with graph neural networks
    Posted by Gabriel Barcik and Duc-Hieu Tran, Research Engineers, Google Research In today's digital age, smartphones and desktop web browsers serve as the primary tools for accessing news and information. However, the proliferation of website clutter — encompassing complex layouts, navigation elements, and extraneous links — significantly impairs both the reading experience and article navigation. This issue is particularly acute for individuals with accessibility requirements. To improve the user experience and make reading more accessible, Android and Chrome users may leverage the Reading Mode feature, which enhances accessibility by processing webpages to allow customizable contrast, adjustable text size, more legible fonts, and to enable text-to-speech utilities. Additionally, An…  ( 93 min )
  • Open

    Does C++ in mujoco have benefits over python for reinforced learning?
    I'm trying to build a humanoid model and then train it to perform some tasks , I have decided to go with mujoco for the simulation and now I'm wondering if I should use the C++ API or the python one. the python implementation says it uses C API but is it good? Also if it's slower than the c++ one how slow is it? I'll probably have to make something real time and hence can't compromise much on the speed, but if it's only and small amount it's acceptable. would really appreciate some guidance in this matter thank you submitted by /u/rakk109 [link] [comments]  ( 9 min )
    Algorithmic pricing environments for RL
    Hello, I am looking for environments to test out some ideas I have for algorithmic pricing. By algorithmic pricing environments, I mean there are multiple competing algorithms trying to maximize profits. I can't really find any out of the box implementations. There are trading environments but those are not what I am looking for. Any help would be appreciated, thanks. submitted by /u/Next_Gap8224 [link] [comments]  ( 9 min )
    Some confusion about using mocap in Mujoco
    Hi! Recently, I tried to follow fetch_pick_and_place.env in gymnasium_robotics to build a similar environment with Franka. I found that the core of this implementation is to use the mocap to control the end-effector, and then mocap derives joint angles using the built-in inverse kinematics algorithm. For the fetch_pick_and_place.env, mocap does not cause mutations and oscillations in configuration space. However, when I use mocap to control Franka, oscillations in joint space occur frequently, although I've minimized the step size of the mocap to ensure that the movement of the end-effector in Cartesian space is minor. Fetch and Franka are both redundant arms, I don't know why there is such a big difference in mocap performance. Here is the video to illustrate the above phenomenon Franka I've opened issues on mujoco and gymnasium robotics repositories, but it didn't initiate any discussion. Any help would be appreciated! Thanks! ​ submitted by /u/UpperSearch4172 [link] [comments]  ( 9 min )
  • Open

    Build a classification pipeline with Amazon Comprehend custom classification (Part I)
    In first part of this multi-series blog post, you will learn how to create a scalable training pipeline and prepare training data for Comprehend Custom Classification models. We will introduce a custom classifier training pipeline that can be deployed in your AWS account with few clicks.  ( 10 min )
    Fine-tune Falcon 7B and other LLMs on Amazon SageMaker with @remote decorator
    Today, generative AI models cover a variety of tasks from text summarization, Q&A, and image and video generation. To improve the quality of output, approaches like n-short learning, Prompt engineering, Retrieval Augmented Generation (RAG) and fine tuning are used. Fine-tuning allows you to adjust these generative AI models to achieve improved performance on your domain-specific […]  ( 8 min )
    Simplify access to internal information using Retrieval Augmented Generation and LangChain Agents
    This post takes you through the most common challenges that customers face when searching internal documents, and gives you concrete guidance on how AWS services can be used to create a generative AI conversational bot that makes internal information more useful. Unstructured data accounts for 80% of all the data found within organizations, consisting of […]  ( 14 min )
  • Open

    Yocto, ROS 2, and Hardware Acceleration: A Production-Grade Trio for Robotics
    submitted by /u/pablocarrera [link] [comments]  ( 8 min )
  • Open

    Chinese Remainder Theorem synthesis algorithm
    Suppose m = pq where p and q are large, distinct primes. In the previous post we said that calculations mod m can often be carried out more efficiently by working mod p and mod q, then combining the results to get back to a result mod m. The Chinese Remainder Theorem assures us that […] Chinese Remainder Theorem synthesis algorithm first appeared on John D. Cook.  ( 5 min )
    Gaining efficiency by working modulo factors
    Suppose m is a large integer that you are able to factor. To keep things simple, suppose m = pq where p and q are distinct primes; everything in this post generalizes easily to the case of m having more than two factors. You can carry out calculations mod m more efficiently by carrying out […] Gaining efficiency by working modulo factors first appeared on John D. Cook.  ( 5 min )
    Group theory and RSA encryption
    RSA encryption a map from numbers mod n to numbers mod n where n is a public key. A message is represented as an integer m and is encrypted by computing c = me mod n where e is part of the public key. In practice, e is usually 65537 though it does not have […] Group theory and RSA encryption first appeared on John D. Cook.  ( 7 min )
    RSA encrypted messages that cannot be decrypted
    Not all messages encrypted with the RSA algorithm can be decrypted. This post will show why this is possible and why it does not matter in practice. RSA in a nutshell RSA encryption starts by finding two large primes, p and q. These primes are kept secret, but their product n = pq is made public. […] RSA encrypted messages that cannot be decrypted first appeared on John D. Cook.  ( 6 min )
  • Open

    Microsoft at ACM SIGCOMM 2023: Innovating the future of networking
    Modern applications heavily rely on robust network infrastructure, requiring continuous innovation. In this evolving landscape, Microsoft is at the forefront, spearheading innovation efforts in networking and strengthening the foundational network infrastructure that underpins the cloud ecosystem. By investing in and enhancing this critical infrastructure, Microsoft not only ensures the resilience and scalability of cloud services […] The post Microsoft at ACM SIGCOMM 2023: Innovating the future of networking appeared first on Microsoft Research.  ( 10 min )
    AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens
    What’s the driving force behind AI’s recent, rapid progress? Research manager Ahmed Awadallah shares his insights on this, the two-stage approach to training large-scale models, and the need for better model evaluation in this episode of the #MSRPodcast. The post AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens appeared first on Microsoft Research.  ( 31 min )
  • Open

    Are data science certifications the gateway to competitive pay?
    Working as a data scientist is the dream of many IT professionals these days. It is no secret that data science is a skyrocketing field attracting young professionals and inspiring many to switch careers to data science. On one front are young professionals who study their courses in colleges to pursue their dream of becoming… Read More »Are data science certifications the gateway to competitive pay? The post Are data science certifications the gateway to competitive pay? appeared first on Data Science Central.  ( 19 min )
    CUPED for starters: Enhancing controlled experiments with pre-experiment data
    CUPED: Improve Your A/B Testing - Detect Smaller Gains, Utilise Smaller Samples and Make Smarter Decisions! The post CUPED for starters: Enhancing controlled experiments with pre-experiment data appeared first on Data Science Central.  ( 26 min )
    Searching for sustainability in data center cooling
    Data centers are known for their impact on the environment. They run 24/7 and exude a lot of heat. Massive warehouses full of hot technology require advanced cooling systems or an HVAC system pushed to its limit.  Data center managers and sustainability leaders no longer settle for antiquated techniques. They’re striving to develop greener and… Read More »Searching for sustainability in data center cooling The post Searching for sustainability in data center cooling appeared first on Data Science Central.  ( 20 min )
    Collaborative visual knowledge graph modeling at the system level
    The best way to model business and consumer dynamics is collaboratively, with stakeholders all in the same virtual room contributing. Of course, this has been happening asynchronously for some time now, but the potential exists for more real-time interaction.  Modelers don’t work in a vacuum, of course. The iterations between a modeler who develops a… Read More »Collaborative visual knowledge graph modeling at the system level The post Collaborative visual knowledge graph modeling at the system level appeared first on Data Science Central.  ( 20 min )
  • Open

    Shout at the Devil: Capcom’s ‘Devil May Cry 5’ Joins GeForce NOW
    GFN Thursday is downright demonic, as Devil May Cry 5 comes to GeForce NOW. Capcom’s action-packed third-person brawler leads 15 titles joining the GeForce NOW library this week, including Gears Tactics and The Crew Motorfest. It’s also the last week to take on the Ultimate KovaaK’s Challenge. Get on the leaderboard today for a chance Read article >  ( 6 min )
  • Open

    A pose-mapping technique could remotely evaluate patients with cerebral palsy
    The machine-learning method works on most mobile devices and could be expanded to assess other motor disorders outside of the doctor’s office.  ( 10 min )

  • Open

    [D] OpenGL Library Error
    Has anyone ever encountered this error while working with the OpenGL library? raise ImportError("Unable to load OpenGL library", *err.args) ImportError: ('Unable to load OpenGL library', "Could not find module 'OSMesa' (or one of its dependencies) I have done pip install opengl and pyopengl, but it doesn't seem to solve the problem. Specifically, I am trying to run this code: https://github.com/brjathu/LART submitted by /u/BigDreamx [link] [comments]  ( 9 min )
    [D] Wandb remote agent source code managing
    Hi all, I'm new to machine learning and have decided to use a combination of Stable baselines3 and Wandb. I'm at the point where I'm running sweeps using Wandb and want to utilize another PC I have laying around to run agents on. What is the best way to get my python code for the agent to run onto the spare PC? I know I can manually load copy the code over, but I'm looking for a more efficient method that will maintain any changes made to the source code. Maybe packaging up the python code within the sweep and having the agent download and execute it? I'm not all too familiar with possibilities and limits of trying to achieve this so any and all input is appreciated. Thanks! submitted by /u/chip_fork [link] [comments]  ( 9 min )
    [D] Mean scores or appending all the predictions in cross-validation for model performance evaluation
    I have this question that I cannot seem to settle in my head. All papers that I read, report the average (std) performance results across each folds when they report F1, Precision, etc.. Somebody that I highly trust in ML (somebody with a PhD in the field) was reporting the results after saving all the predicted labels (y_pred) and actual labels (y_true) to a list and compute the F1 score one time with the pooled predictions. I now am working on a dataset (binary classification) and trying to validate my model using leave-one-subject-out CV (Some people in my dataset have more lines than others). When I take the average of all iterations I get poor results (F1 score= 0.5), but if I pool all the predictions and compute the F1 score at the end, I achieve decent performance (F1 score =0.7). So, in my project, it is in my best interest to use the second approach, and somebody that I trust tells me that it is okay to do this approach. But I cannot seem to find a paper that says that this approach is acceptable or good. What do you guys think and do you have any suggestions? submitted by /u/enthusiastic31 [link] [comments]  ( 9 min )
    [P] Looking for Efficient Encoding Methods for Java Method Names for Downstream Classification Task
    I'm working on a project that involves analyzing large samples of Java codes. My end goal is to perform classification based on these codes. For this, I've been trying to efficiently encode the full names of the methods (in the package.class.methodname format) in the Java code. Currently, I am experimenting with doc2vec. I'm treating the components of each method's full name (separated by dots) as individual documents. This allows me to produce vectors for each method name, and I evaluate the results by computing the cosine similarity between pairs of similar method names. The results were not good so far. Before moving to doc2vec, I tried using an LLM which gave me good results. However, the inference time was far too long, especially given the scale at which I'm working. I also considered using a Bag of Words model, but quickly realized it wouldn't be effective. Many of the method names in my samples are obfuscated, making this approach unsuitable. The issue I'm facing is that using the direct method names as features is not generalizing well for classification. A slight change in a method's name results in losing that particular feature, making my model fragile. What are some optimal encoding methods for small sentences (around 5-6 words) like these method names, that can scale well? Also, Are there any specialized encoding techniques tailored for software code that I could use for this task? Any suggestions or insights would be really helpful. Thank you! submitted by /u/Practical_Mango_8720 [link] [comments]  ( 9 min )
    [R] Efficient Memory Management for Large Language Model Serving with PagedAttention - UC Berkeley et al 2023 - 2-4x higher throughput than HuggingFace Transformers without requiring any model architecture changes!
    Paper: https://arxiv.org/abs/2309.06180 Github: https://github.com/vllm-project/vllm Blog: https://vllm.ai/ Abstract: High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks dynamically. When managed inefficiently, this memory can be significantly wasted by fragmentation and redundant duplication, limiting the batch size. To address this problem, we propose PagedAttention, an attention algorithm inspired by the classical virtual memory and paging techniques in operating systems. On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage. Our evaluations show that vLLM improves the throughput of popular LLMs by 2-4× with the same level of latency compared to the state-of-the-art systems, such as FasterTransformer and Orca. The improvement is more pronounced with longer sequences, larger models, and more complex decoding algorithms. https://preview.redd.it/x8w8ckejv2ob1.jpg?width=667&format=pjpg&auto=webp&s=28fae778b67ac28fc72d084f071b12c92cb5ea07 https://preview.redd.it/ctlrqpejv2ob1.jpg?width=1468&format=pjpg&auto=webp&s=31755d169673ee5d30efa3f05bd6cb10813b328d https://preview.redd.it/z5r7knejv2ob1.jpg?width=1504&format=pjpg&auto=webp&s=9ceb5370aa5a7cc0688fe9a3771a0328262c3a01 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] LLM for viral tweet generation
    Problem: Given a database of the most viral tweets (of a certain shared category), I am hoping to use LLM's to generate further viral tweets. Currently I am seeing this as a synthetic data generation problem: two approaches I am thinking of is 1) grounding (using viral examples to guide the prompt) and 2) filtering (finetuning an LLM to predict virality and filtering for the most viral generations) I want to ensure that the output retains the "viral" structure/style and is diverse/new (i.e no copies). Any general directions or references are appreciated submitted by /u/greatSWE [link] [comments]  ( 9 min )
    [D] 1is Msc Math enough to secure jobs in R&D sector for AI/ML?
    I'm currently doing Integrated MSc in Mathematics (in India) and until now I've done a remote research intern in a French university and I'll do a research intern at a French research laboratory (INRIA-LORIA) next year, I want a job (in India) at the R&D sector (Data Scientist or ML Engineering anything would do) Idk if research interns are as valuable as Industry internships when it comes to R&D? Basically after my masters, I'll probably have only these two internships to show as a work experience, Probably won't be able to get an internship in an IT company unless i opt for remote work (which is also not guaranteed atm). Mu question, generally is Msc + 2 research internships + 1or 2 publications good enough to secure a decent job in R&D? submitted by /u/Emotional-Zebra5359 [link] [comments]  ( 9 min )
    Can I work later as an ML engineer? [D]
    Hello!! I have a BSc in Mathematics and currently I'm going to start a MSc in FinTech ( it has 3 courses out of 9 that have to do with ML, NN and many more ). Since I am really fascinated about programming ( didn't like it when I was at Maths department because of teachers and I am starting learning on my own through courses and we will have many programming languages in my masters degree ex. R,Python, SQL and others) and I would start as a data scientist at the beginning, could I through years of experience later ( ideally 1-2 years later, provided that I have a solid and good programming skills and projects) become a ML engineer? ( Now of course I can't become one because I know that it is difficult since I am competing with people that have CS degrees). submitted by /u/math-is-cool-62 [link] [comments]  ( 9 min )
    "[P]" Seeking technical Co-Founder: Private Equity SaaS Startup
    Hi there! I bring 2 years of experience from a European investment fund and a solid idea for a B2B SaaS solution targeting private equity investment funds. This market is notoriously challenging to penetrate without insider knowledge. The timing couldn't be better. Similar kind of software is currently sold €20k per user by a semi-monopolistic boomer company that is ready to be disrupted. Offer: - Equity shared equally. - Ready to quit my job and go full-time on it, if I find the right co-founder. Ideal Partner: Seeking someone proficient in SQL for handling large datasets and able to integrate OpenAI's API within such datasets (technical AI skills required). While I'm based in Berlin and prefer a European co-founder, it's not mandatory, but meeting in person is a must before we commit. Don't be afraid to DM me if intrigued! Together, we can make it happen. Let's revolutionize the sector! submitted by /u/Sudden_Possible489 [link] [comments]  ( 9 min )
    Books for machine learning. [D]
    İ am lookong for the pdfs about machine learning, maths for ML, ml projects. İs there any sites i can find pdf like that? submitted by /u/Necessary-Car-5080 [link] [comments]  ( 9 min )
    [D] Tensorflow Dropped Support for Windows :-(
    Hey, I've been using TF pretty much my whole deep learning career starting in 2017. I've also used it on Windows the entire time. This was never a major issue. Now when I tried (somewhat belatedly) upgrading from 2.10 to 2.13, I see the GPU isnt being utilized and upon further digging see that they dropped Windows GPU support after 2.10: "Caution: TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows. Starting with TensorFlow 2.11, you will need to install TensorFlow in WSL2, or install tensorflow or tensorflow-cpu and, optionally, try the TensorFlow-DirectML-Plugin" This is really upsetting! Most of the ML developers I know actually use Windows machines since we develop locally and only switch to Linux for deployment. I know WSL is an option, but it (1) can only use 50% RAM (2) doesnt use the native file system. I feel very betrayed. After sticking with, and even advocating for Tensorflow when everyone was (and still is) switching to PyTorch, TF dropped me! This is probably the final nail in the coffin for me. I will be switching to PyTorch as soon as I can :-( -Disgruntled user submitted by /u/rsandler [link] [comments]  ( 9 min )
    [R] --Research Participants Required-- Age perception of AI generated infant faces compared to real infant faces. (Suitable for everyone) (Available for Laptop/PC and Tablet devices only)
    https://research.sc/participant/login/dynamic/A1D66883-6E8F-409B-8EF9-AC989A76C7E9 Psychology researchers at Swansea University are carrying out an original study to see whether artificial intelligence is able to generate infant faces between the ages of 0 and 7 years. AI generated pictures will be presented alongside real faces from an existing face database, with a sliding scale underneath that you will use to roughly estimate the age of the child’s face. The experiment should only take between 15 and 20 minutes Thank you for your time. submitted by /u/Logipsychlical [link] [comments]  ( 9 min )
    [D] Will be presenting a talk on Data Pre-processing in Deep Learning - what would be the topics, notebooks or datasets would you include if you would be giving such talk?
    So I'll be presenting a talk on Data pre-processing in deep learning in my city's Keras Community Day, and I am still thinking about all the content I want to present there. What I want from this session is to present different ways of preprocessing the data for a deep learning model. I want to show different types of implementations, how those affect the final trained model, when to use which type of data preprocessing and things similar to this. It would be great if you can suggest me some topic, notebooks or datasets for the same. All the notebooks that show good implementation and affect of data preprocessing are absolutely welcome. Also, as this is **Keras** Community Day, I'll have to include more about data preprocessing using Keras and less about other libraries. Also, if you could help me with this: I am confused between showing preprocessing using layers or doing the preprocessing without layers. I know this sounds vague, but if you have any idea about this, let me know. Thank you for reading! submitted by /u/inclinedadarsh [link] [comments]  ( 9 min )
    [D] Need help Selecting MSc. Courses
    I'm currently in my first year of MSc. in Engineering Mathematics and Computational Science. First Study Period (Currently) I have nonlinear optimization and High-performance computing. The track I want to choose is a mix between Machine Learning and Big Data. I can select 2 courses for Study Period 2. Here are the potential courses to select from: Game Theory and Rationality Large-Scale Optimization Advanced Probabilistic machine learning Basic Stochastic Processes Options and Mathematics (Options trading) Foundations of Probability Theory ​ I need to select 4 potential courses and rank them from high preference to low preference. In case I don't get one of them, the other will be preferred. Please Machine Learning Reddit Gods, Help me. submitted by /u/AdMaster9439 [link] [comments]  ( 9 min )
    [D] We built Beam: An ultrafast serverless GPU runtime
    Hi r/MachineLearning, TL;DR: Run AI apps on pay-per-second cloud GPUs that hot reload with your code changes. Documentation: https://docs.beam.cloud I’m Eli, and my co-founder and I built Beam to run workloads on serverless cloud GPUs with hot reloading, autoscaling, and (of course) fast cold start. You don’t need Docker or AWS to use it, and everyone who signs up gets 10 hours of free GPU credit to try it out. Here a few examples of things you can run on Beam: Fine-tune a LLaMA LLM Transcribe videos with Whisper Train a custom stable diffusion model Beam is built for a fast developer experience. We’ve felt that using Docker and AWS directly is too slow for iterative development. You’ll often find yourself making changes to your code and waiting 10 minutes for a new image to bu…  ( 10 min )
    [D] mlflow plugin manager - early days / looking for feedback and alpha users
    Hey r/machinelearning! I'm thrilled and anxious to share an early version of the MLflow Plugin Manager. It's designed to simplify your mlflow installation, allowing you to install, update, and uninstall MLflow plugins directly from the web interface. Think of it as the "wbond's package manager for sublime", but tailored for MLflow! 📽️ https://i.redd.it/9gj8vqcz01ob1.gif ​ Yes, it's in its infant stages and doesn't boast of a fancy UI yet, but I'm eager to get your feedback! 🔍 What are your first impressions? Is this a good idea? 💡 Any features you'd love to see? 🌐 Ideas on promoting or expanding its reach? I built this to bring a bit more ease to our community. Can't wait to hear your thoughts and where we can take this next! Repo: https://github.com/thijsdezoete/mlflow-plugin-manager/ submitted by /u/jessepnk [link] [comments]  ( 9 min )
    [Discussion] Non deterministic behaviour in LLMs when temperature set to 0?
    Hi all, Someone asked me today "why are LLMs still non deterministic in their output when temperature is set to 0. Assume fixed model between runs on the same machine" I was like WTF are you saying - the randomness in LLM comes from temperature - chat gpt etc.. might have other randomness in the process but we don't have exact info on this. What I know is that in a standard transformers architecture, temperature is the only parameter that can enduce non deterministic behaviour at inference time. He was convinced that there was more to it "i spoke about this to other LLM experts and they also are not sure" I'm like wtf - I start looking up online and do find some people who claim that temperature is not the only thing that influences stochasticity during inference, but I can't find an answer as to what it is exactly. Anyone has a clue of what I am missing here? Thanks! submitted by /u/WagnerianJLC [link] [comments]  ( 9 min )
    [P] Will Tsetlin machines reach state-of-the-art accuracy on CIFAR-10/CIFAR-100 anytime soon?
    ​ A composite of specialized Tsetlin machines that enables plug-and-play collaboration. I have a love-and-hate relationship with CIFAR-10/100. I love the datasets for the challenge. On the other hand, they are two datasets where Tsetlin machines have struggled with getting state-of-the-art performance. (The Tsetlin machine is a low-energy logic-based alternative to deep learning that has done well on MNIST, Fashion-MNIST, CIFAR-2, and various NLP tasks.) I have been working for some time now on figuring out a solution, and this summer, I finally had a breakthrough: a new architecture that allows multiple Tsetlin machines to collaborate in a plug-and-play manner, forming a Tsetlin machine composite. The collaboration relies on a Tsetlin machine's ability to specialize during learning and…  ( 10 min )
    [P] Tutorial - Create an Object Detector for Any Game Using YOLO
    Hello everyone! Have you ever thought about how to create an object detection system using YOLO that works with any game? If you're interested, I've created a tutorial with all the steps to develop this system. I'll leave a link to the video where I demonstrate the process step by step using the game Diablo 2 as an example. I hope you enjoy it, and if you have any suggestions, feel free to send a message or comment here! link to the tutorial: https://www.linkedin.com/posts/moisesdias_english-version-below-tutorial-crie-activity-7107686497885011969-ZLVW/ submitted by /u/moisesdepaulodias [link] [comments]  ( 9 min )
    [R] Adversarial Reinforcement Learning
    A curated reading list for the adversarial perspective in deep reinforcement learning. https://github.com/EzgiKorkmaz/adversarial-reinforcement-learning submitted by /u/ml_dnn [link] [comments]  ( 9 min )
    [P] The XOR trick
    Can a single layer neural network solve the XOR problem? Most answers say no, but with this one weird trick the answer is yes! And we don't even need a bias! The trick is to multiply the outputs of a single (2,2) linear layer. Here is how: class XorSolver(nn.Module): def __init__(self, *args, **kwargs) -> None: super().__init__(*args, **kwargs) self.layer = nn.Linear(2, 2, bias=False) # we don't even need a bias! def forward(self, x: torch.Tensor) -> torch.Tensor: z = self.layer.forward(x) y = z[:, 0] * z[:, 1] return y This is the loss and model output after 5000 epochs: loss: 6.516383166399464e-08 Input: [[0. 0.] [0. 1.] [1. 0.] [1. 1.]] Model output: [0. 1. 1. 0.] Expected output: [0. 1. 1. 0.] Layer weight: [[-1.248097 1.2202195 ] [-0.80121976 0.81952316]] The full implementation with training and inference - around 50 lines of code - can be found on GitHub. Why it works? Basically the model simulates a more sophisticated neuron which allows more interactions between the inputs. By multiplying the outputs of two neurons, we introduce a form of non-linearity that allows us to separate data that are not linearly separable, like in the XOR problem: f(x1, x2, w1, w2, w3, w4) = (x1 * w1 + x2 * w2) * (x1 * w3 + x2 * w4) w1, w2 are learnable parameters of the first neuron w3, w4 are learnable parameters of the second neuron x1, x2 are inputs to the model Related studies Solving XOR with a single Perceptron Artificial Neural Networks With Adaptive Polynomial Activation Function Single Cortical Neurons as Deep Artificial Neural Networks Dendritic action potentials and computation in human layer 2/3 cortical neurons submitted by /u/tecbar [link] [comments]  ( 9 min )
    [R] Renting cloud services
    Hello guys. As I plan on soing a scientific research project, I would need some cloud compute. Say for a month of usage(can I rent for month?). What are some popular options? I am looking at something with 48gb vram pooled,mybe a600 or a100 and some decent cpu, and 2tb space. submitted by /u/Outrageous_Ad1452 [link] [comments]  ( 9 min )
    [D] Guidance in training different models and comparing using smaller versions
    Currently I'm training medium (1B-3B) sized audio models. I have several different architectures in mind. Obviously I don't want to train the full-sized models and then compare them, thats a waste of money. So I'm thinking of training smaller versions (~100M) and then comparing those instead. My question is there some sort of best practice for this? Some smaller multiple of your full model size where it is best to compare? Thanks. submitted by /u/ginger_turmeric [link] [comments]  ( 9 min )
  • Open

    ‎The Economist Podcasts: Babbage: Mustafa Suleyman on how to prepare for the age of AI
    submitted by /u/siiilverrsurfer [link] [comments]  ( 9 min )
    Dystopia AI Movie
    submitted by /u/the_anonymizer [link] [comments]  ( 8 min )
    Looking for AI developers and researchers
    Hi, I would love to create a small group of people who work together in AI. The project would be to create an AI that can infer new novel knowledge from existing datasets, as opposed to be being limited by operating within the training data. Specifically to be used in the quest to learn more about the universe. So I am looking for a team of likeminded individuals who want to grow in the field of AI. I'd love to setup a discord, subreddit and github profile to showcase our work. My introduction question is: How do we get AI's to expand upon current knowledge instead of just serving from the knowledge itself. Anyone interested in joining me in this? submitted by /u/Miserable-Cobbler-16 [link] [comments]  ( 9 min )
    Don't worry, folks. Big Tech pinky swears it'll build safe, trustworthy AI
    Eight big names in tech, including Nvidia, Palantir, and Adobe, have agreed to red team their AI applications before they're released and prioritize research that will make their systems more trustworthy. The White House has secured voluntary commitments from Adobe, Cohere, IBM, Nvidia, Palantir, Salesforce, Scale AI, and Stability AI to develop machine-learning software and models in a safe, secure, and trustworthy way. The commitments only cover future generative AI models. Each of the corporations has promised to submit their software to internal and external audits, where independent experts can attack the models to see how they can be misused. The organizations agreed to safeguard their intellectual property and make sure things like the weights of their proprietary neural networks don't leak, while giving users a way to easily report vulnerabilities or bugs. All eight companies agreed to focus on research to investigate societal and civil risks AI might pose if they lead to discriminatory decision-making or have weaknesses in data privacy. The US government wants Big Tech to develop watermarking techniques that can identify AI-generated content. The US has asked the corporations to commit to building models for good, such as fighting climate change or improving healthcare. Source : https://www.theregister.com/2023/09/12/nvidia_adobe_palantir_ai_safety/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Harvard iLab-funded project: Sub-feature of the platform out -- Enjoy free ChatGPT-3/4, personalized education, and file interaction with no page limit 😮. All at no cost. Your feedback is invaluable!
    submitted by /u/Raymondlkj [link] [comments]  ( 9 min )
    Dissatisfied with GPT paid subscription - who should I go with instead?
    I’ve been using the paid version of GPT for a while but I think it’s time to move on. For $20 a month i would at least like for there to be an option to input an image, but I would also maybe pay a bit more than that per month for a suite of tools or something if it could also do image generation in addition to just text. I’m sorry if it seems like I should be able to understand anything better - please note I am disabled, my use of the tools is personal (creative and household) and not professional, and I’m doing my best by asking here. I do not mean to bother anyone with my own ignorance. Thank you. submitted by /u/CaveLady3000 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/13/2023
    Project Gutenberg and Microsoft have created thousands of free audiobooks that use neural text-to-speech technology to generate the voices.[1] A group of U.S. authors, including Pulitzer Prize winner Michael Chabon, has sued OpenAI in federal court in San Francisco, accusing the Microsoft-backed program of misusing their writing to train its popular artificial intelligence-powered chatbot ChatGPT.[2] Numenta launches brain-based NuPIC to make AI processing up to 100 times more efficient.[3] Adept AI Labs released Persimmon-8B. Persimmon-8B is an open-source, fully permissively licensed model in the 8B class. This model holds immense potential for a wide array of applications, aiming to assist users in various computer-related tasks.[4] Sources: [1] https://www.zdnet.com/article/heres-how-to-access-thousands-of-free-audiobooks-thanks-to-microsoft-ai-and-project-gutenberg/ [2] https://www.reuters.com/technology/more-writers-sue-openai-copyright-infringement-over-ai-training-2023-09-11/ [3] https://venturebeat.com/ai/numenta-launches-brain-based-nupic-to-make-ai-processing-up-to-100-times-more-efficient/ [4] https://www.marktechpost.com/2023/09/09/adept-ai-labs-open-sources-persimmon-8b-a-powerful-fully-permissively-licensed-language-model-with/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Is there an AI image tool that makes existing images look like renders?
    I see a ton of AI image tools out there. Some let you upload image files and modulate/modify them in some way. I am wondering if a tool exists that will take a real life product image and make it appear more like a render/computer generated image. Essential I would love to be able to take a pic of a product and use automatic smoothing and AI simulated rendering to output a clean image that looks like a 3d render. This would be used as a product image for an e-commerce website. submitted by /u/ElonMusk0fficial [link] [comments]  ( 9 min )
    Alibaba Cloud open sources its generative AI models Tongyi Qianwen
    Alibaba Cloud has open sourced two of its generative AI models, Qwen-7B and Qwen-7B-Chat, for commercial and research use. The models' codes and documentation will be accessible through Alibaba Cloud's AI model repository ModelScope and the US collaborative AI platform Hugging Face. Companies with fewer than 100 million monthly active users can use the models for commercial purposes free of charge, while those with more users will need to request a license. Alibaba aims to democratize AI technology and support LLM start-ups. Alibaba Cloud's ModelScope platform currently features over 1,000 ready-to-use AI models contributed by 20 leading AI institutions. Source : https://www.scmp.com/tech/big-tech/article/3229907/alibaba-cloud-open-sources-its-two-generative-ai-models-based-chatgpt-style-tongyi-qianwen submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Is there an AI tool for generating videos using stock footages?
    I have a text script that I want to turn into a video. For the sake of context, the video is on balancing a person’s daily activities. I’m getting tired trying to find matching stock footages for the videos. I was wondering if there is a way to do this using AI tools? Synthesia won’t do because it looks like a video narration, more than a video essay. Any suggestions would help. Thanks in advanced! submitted by /u/Entaro2109 [link] [comments]  ( 9 min )
    Google Codey HumanEval Benchmark
    Hi everyone, I'm hunting for a HumanEval Benchmark for Google's Codey model and am having a tough time hunting it down. Can anyone point me to an Arxiv paper or a coding leaderboard that includes Codey? Thanks! submitted by /u/Iamreason [link] [comments]  ( 9 min )
    Many executives/investors are pushing for the use of generative AI in products/applications…
    But product & tech teams succumb to the pressure and move on to developing proof of concepts & even launch products that fail to achieve ROI Why? 1- use cases are not well defined 2- not enough data or right data strategy 3- data and model architecture not founded well I love Apple’s approach to AI, they shy away from the hype and focus on the fundamentals. First the customer, product, then the tech that will add the value the customer needs. What do you think are the top reasons generative AI applications succeed? View Poll submitted by /u/AILaunchpad [link] [comments]  ( 9 min )
    Create a custom search engine
    I have an open book exam that has a lot of information that I will need to physically search through. Is there a way i can load all he PDFS and create a customised chatgbt style search, so i can easily look through all the information and research i have? submitted by /u/yellowmushroom22 [link] [comments]  ( 9 min )
    HeyGen's one-click translation from English to Italian, Hindi, German and Spanish is the craziest AI application I've seen in months.
    submitted by /u/Fadawah [link] [comments]  ( 8 min )
    I wanna develop small scale personal AI apps for each my friends and AI said i should learn about Python, TensorFlow Lite, SQLite, GCP, Pandas, Scikit Learn and Keras. How right is this?
    I wanna develop small scale personal AI apps for each my friends and AI said i should learn Python, TensorFlow Lite, SQLite, GCP, Pandas, Scikit Learn and Keras. How right is this? submitted by /u/Leading-Ad2278 [link] [comments]  ( 9 min )
    AI image generators have a moderation problem
    Tests carried out by Logically confirm these platforms accept 85% of prompts tailored for election manipulation. submitted by /u/Asleep-Television-24 [link] [comments]  ( 9 min )
    AI Chatbots successfully build software in under 7 minutes for less than $1
    AI Chatbots, such as OpenAI's ChatGPT, can create incredibly cost-friendly software in record time, reveals a new study. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/rxpr6db3aynb1.png?width=1300&format=png&auto=webp&s=721ff5d8f9d25b5e48fa26e2b335c1d20620a83a The AI Tech Company Experiment Brown University and several Chinese University researchers put ChatGPT-powered AI bots to the test by making them run a hypothetical software development company, ChatDev. The AI chatbots were given specific roles and allocated respective stages based on the traditional waterfall model encompassing designing, coding, testing, and documenting. AI 'employees' functioned with minimal human input to complete their parts of the software development process. Impressive Results Assigning 70 tasks to ChatDev led to the completion of the entire software development process in under seven minutes at a cost of less than one dollar on average. A stunning 86.66% of the generated software systems performed flawlessly. Despite some language model errors and biases, the study demonstrates AI's immense potential in automating tasks - a boon, especially to junior programmers around the world. Broader Implications Powerfully generative AI technologies like ChatGPT can perform specific job functions, saving time, and boosting productivity in several industries. While coders find such tools beneficial, it's also critical to note that limitations and biases do exist in AI models which could potentially affect the software creation process. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most vital news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    How Business Thinkers Can Start Building AI Plugins With Semantic Kernel
    submitted by /u/mycall [link] [comments]  ( 9 min )
    EU leads the way in regulating AI
    submitted by /u/Jariiari7 [link] [comments]  ( 8 min )
    Webinar with Dr. Richard Marks
    >Sailea is a student run non-profit that does not charge for any of its services 🌟 Join SAILea’s Free Webinar with Dr. Richard Marks! 🌟 🗓️ Date: September 23rd, 2023 ⏰ Time: 3:00-4:00PM EST Don't miss an exclusive opportunity to learn from an AI expert! Join us for a free webinar featuring Dr. Richard Marks, a renowned CS and Data Science professor at UNC-Chapel Hill University with a remarkable journey – from Google to PlayStation, and the mind behind EyeToy and PlayStation Move. 🚀 What to Expect: 🔹 Deep insights into tech innovation. 🔹 Career advice. 🔹 Live Q&A with Dr. Richard Marks. Reserve your spot now: sailea.org/events 🔥 Don’t miss this opportunity! Register today!🔥 submitted by /u/Envoy-Insc [link] [comments]  ( 9 min )
  • Open

    A Generic Multi-Agent Scenario
    I was thinking of some major challenges in modeling a generic real-world environment. Some of them are: adaptive agents, uncertain intentions, and lack of common knowledge. However, most of the papers I see on RL make some assumption or other violating one or more of these, like considering simple agents, assuming known knowledge about others' intentions, and considering that the models of other agents are known when irl an agent hardly has a model of other agents it interacts with apriori. Consider an airport scenario where agents are trying to get into respective planes within a given time, and the gates to each plane allow one person at a time. Looking at the scenario from the view of a single agent, they know what they want, but they can't really make any assumptions about the intentions, strategy, and complexity of other agents beforehand. These other agents can be neutral or adversarial (competing for getting in the same plane) from the agent's viewpoint. All they can see is a restricted view of the motions of some of the other agents. What would you say could model and provide a solution in such a scenario? It is to be noted that other agents can change their strategies based on actions taken by you till now, and so can you. Due to having incomplete information, I fail to see the notion of an equilibrium, and the agents needn't be fully rational as well. submitted by /u/Quirky_Concoction [link] [comments]  ( 9 min )
    TurboZero: a vectorized implementation of AlphaZero + more
    https://github.com/lowrollr/turbozero I've recently been working on TurboZero, a vectorized implementation of AlphaZero where model inference, search (MCTS), and environment simulation all run in parallel on the GPU. I've also implemented a self-contained training/evaluation pipeline, along with a few environments. I've written a wiki and a starter notebook for those who want to dig deeper. This project is similar to DeepMind's mctx, but supports MCTS subtree persistence (unnecessary for MuZero, which is what mctx was mainly built to support), is written with PyTorch rather than JAX, and can also stand on its own and train models end-to-end. I hope to continue to expand and improve upon this as time allows, and I hope someone here might find it useful or interesting! This is my first major open-source project of any real substance and I still don't have tons of experience with RL, so any feedback/advice is greatly appreciated. submitted by /u/lowrollr [link] [comments]  ( 9 min )
  • Open

    How an archeological approach can help leverage biased data in AI to improve medicine
    Although computer scientists may initially treat data bias and error as a nuisance, researchers argue it’s a hidden treasure trove for reflecting societal values.  ( 10 min )
    Helping computer vision and language models understand what they see
    Researchers use synthetic data to improve a model’s ability to grasp conceptual information, which could enhance automatic captioning and question-answering systems.  ( 10 min )
    A. Michael West: Advancing human-robot interactions in health care
    When he isn’t investigating human motor control, the graduate student gives back by volunteering with programs that helped him grow as a researcher.  ( 10 min )
  • Open

    Visualize an Amazon Comprehend analysis with a word cloud in Amazon QuickSight
    Searching for insights in a repository of free-form text documents can be like finding a needle in a haystack. A traditional approach might be to use word counting or other basic analysis to parse documents, but with the power of Amazon AI and machine learning (ML) tools, we can gather deeper understanding of the content. […]  ( 8 min )
  • Open

    Research Focus: Week of September 11, 2023
    In this issue: Efficient polyglot analytics on semantic data aids query performance; generative retrieval for conversational question answering improves dialogue-based interfaces; a new tool uses ML to address capacity degradation in lithium-ion batteries. The post Research Focus: Week of September 11, 2023 appeared first on Microsoft Research.  ( 9 min )
    Abstracts: September 13, 2023
    The new #MSRPodcast series “Abstracts” is your source for cutting-edge research in brief. In the first episode, join researchers Ava Amini and Kevin K. Yang to learn about their new paper on using evolutionary-scale protein data to improve protein design. The post Abstracts: September 13, 2023 appeared first on Microsoft Research.  ( 13 min )
  • Open

    Unlocking the Language of Genomes and Climates: Anima Anandkumar on Using Generative AI to Tackle Global Challenges
    Generative AI-based models can not only learn and understand natural languages — they can learn the very language of nature itself, presenting new possibilities for scientific research. Anima Anandkumar, Bren Professor at Caltech and senior director of AI research at NVIDIA, was recently invited to speak at the President’s Council of Advisors on Science and Read article >  ( 5 min )
  • Open

    Chatty LLama: A fullstack Rust + react chat app using Llama-2 https://github.com/Sollimann/chatty-llama
    submitted by /u/Sollimann [link] [comments]  ( 9 min )

  • Open

    I want to try out Stability.AI's chat. After logging in with a Google account, a spinning wheel is all I get. Is it like that for everyone?
    submitted by /u/w__sky [link] [comments]  ( 9 min )
    I made a data request feature so you don't have to exhaustively collect data/dataset(s) yourself!
    So, I've been working on an AI data marketplace platform for a few months now. Users can buy, sell, request, and subscribe to data/datasets (and soon even train their ML/AI models using other users' datasets). One of our key features is the request feature, which allows users to submit data requests for free. These requests include descriptions, required fields, geographical scope, budget etc... Once a request is posted, it's sent to numerous companies, organizations, and data vendors that have the potential to fulfill it. I understand how frustrating the data acquisition process can be, so I designed this platform to be your one-stop shop for all data-related transactions. You no longer have to spend weeks or months dealing with different vendors and companies through slow emails. With our platform, you can request, negotiate, and purchase data all in one place, and it's completely free to post a request, by the way. We've already achieved some successes, and we hope to help more people access the datasets they need. After all, the best AI models are built on diverse and differentiating data. We've had some notable achievements, and we're eager to see if we can fulfill even more interesting dataset requests! submitted by /u/nobilis_rex_ [link] [comments]  ( 9 min )
    You Won’t Believe How Much Tesla’s Dojo Supercomputer Is Worth
    Morgan Stanley Research has valued Tesla's soon-to-be-released Dojo supercomputer at up to $500 billion, potentially increasing the auto giant's valuation significantly. The financial institution believes Dojo’s applications will go beyond Tesla's Full-Self Driving (FSD) capabilities. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/yhia3c5f1vnb1.jpg?width=1440&format=pjpg&auto=webp&s=7adf40e7b868a4fbf6eb3d696132652f4f549f23 Morgan Stanley's Bullish Prediction on Dojo Morgan Stanley has suggested that Dojo might not just enhance Tesla's FSD technology, but could find use in other devices that make real-time decisions based on a visual field. Apart from raising Tesla’s valuation, this could potentially open up new markets for the company. Following this, Morgan Stanley has increased its target price for Tesla shares from $250 to $400 each. Dojo Supercomputer Overview Tesla has developed Dojo in-house, diverging from conventional AI accelerators and involving its own computing, networking, IO, and instruction set. At the heart of Dojo is the D1 AI accelerator processor, containing 354 custom CPU cores. Twenty-five D1 chips are combined to create a Dojo training tile, which could expedite Tesla’s move towards earning revenue from vehicle software. Future Plans and Implications Tesla could potentially become an AI-as-a-service provider to automakers in need of FSD capabilities with Dojo. As the development of Dojo continues, Tesla has invested in alternative AI infrastructures, including a cluster of 10,000 of Nvidia's most potent H100 accelerators. (source) P.S. If you like this kind of analysis, I write a free newsletter that explores the latest AI developments. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Schwoz Sings Ballin
    submitted by /u/LaminateShark7 [link] [comments]  ( 8 min )
    Today we test which AI is smartest. Tomorrow AI tests which human is smartest.
    Somewhere in the world there's a person who is the smartest. Why stop there? There are ten people who are the smartest. And if they are on the Internet, AI will find them. Perhaps not yet. It probably needs to get smarter. Maybe by Gemini. Or GPT-5. But eventually an AI will analyze all of the content on the Internet, and determine from that data who are the ten most intelligent people on the planet, (whose material is online). Of course if AI can determine the top ten, it can certainly determine the top 100, and the top 1,000, and even the top 100,000. I suppose when that happens there will be a lot more human brain power available to solve our problems. Although by then AI will be solving them far better than we could, haha. But think about it for a minute. There are very smart people out there who don't publish in traditional mass media channels. The geniuses among us who don't fit in so well, and are therefore resigned to the margins, remaining unrecognized. Wouldn't it be great if AI discovered them, and gave them the validation they deserve? Wouldn't it be great to find out who they are so that they can better work on whatever. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    China AI and Semiconductors Rise: US Sanctions Have Failed
    The US sanctions on China's AI and semiconductor industries have failed to limit their growth and development. China is rapidly developing supercomputing capabilities and aims to become the world leader in AI by 2030. Chinese companies like Huawei and SMIC continue to import advanced semiconductor manufacturing equipment and develop their own chips, indicating that the export controls have not been effective. The article explores the trajectory of Chinese domestic semiconductor manufacturing and AI capabilities, as well as the potential impact on companies like Apple, Qualcomm, and MediaTek. It also discusses the potential responses from the US and its allies to counter China's advancements. Source : https://www.semianalysis.com/p/china-ai-and-semiconductors-rise Summarized by Nuse AI submitted by /u/NuseAI [link] [comments]  ( 9 min )
    I developed a free Chrome extension, backed by ChatGPT, to identify Amazon product pros and cons from reviews, plus answer questions!
    You can install it from the Chrome web store. https://reddit.com/link/16gm8bq/video/yyhx45xjgsnb1/player ​ submitted by /u/MiladMansory [link] [comments]  ( 9 min )
    Just did a basic experiment across the popular models: “ Write 5 sentences that all end with the word 'apple'.”
    Most of them failed. So this was my prompt: Write 5 sentences that all end with the word 'apple'. It was identical in all models. I only did this exactly once for each one. Here’s the results I got of how many of the 5 sentences ended in “apple”. I let “apples” count as an ending as well even though technically that is a fail. Google palm: 0/5 Falcon 180B: 0/5 Bard: 1/5 Claude 2: 1/5 Gpt 3.5: 2/5 Llama2 70b: 4/5 GPT 4: 5/5 Edit: some examples if you’re curious https://ibb.co/yf19rpb https://ibb.co/rcF1qK8 https://ibb.co/VCQxMwy submitted by /u/jgainit [link] [comments]  ( 9 min )
    NVIDIA, Apple Have Got a Real AI Competitor Now
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
    Use torchvision detectors to track objects using DeepSORT
    Although the torchvision library has contains datasets and model architectures for classification, detection, segmentation, and more, it still needs support for object tracking. This YouTube video takes object detection models from torchvision, and uses them with DeepSORT tracker. submitted by /u/spmallick [link] [comments]  ( 9 min )
    Microsoft and Google's staggering water consumption rates for AI
    AI, with its vast resource needs, is raising concerns over sustainability and environmental impact. Last year, Microsoft's data centers drained over 2,500 Olympic-sized swimming pools worth of water, reflecting a 34% increase from the previous year. Google also reported a 20% water consumption increase over the same period. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/npr6uja0crnb1.png?width=990&format=png&auto=webp&s=b775754b9d42a8129fff2cae675f63c3a291f7bc A deeper look at AI's water footprint The growth of AI and related technologies increases the need for vast server farms, which depend heavily on water for cooling purposes. The spike in water usage can be attributed primarily to AI, as per Shaolei Ren, a researcher at the UC, Riverside, who focuses on AI's environmental impact. For every 5 to 50 prompts submitted to ChatGPT, it consumes about 500 ml of water, according to an upcoming paper from Professor Ren's team. Big Tech and Responsible Water Usage Recognizing their significant water consumption, tech companies like Google have voiced concerns and are exploring ways to mitigate the negative effects. Google has committed to responsible water usage, which includes assessing where and how their water usage might affect surrounding areas. (source) P.S. If you like this kind of analysis, I write a free newsletter that explores the latest AI developments. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/11/2023
    Alibaba Adds Smart Assistant and Upgraded Image Search to B2B Platform.[1] Collina Strada has called on AI to help create its spring/summer 2024 collection, unveiled during New York Fashion Week.[2] LexisNexis is embracing generative AI to ease legal writing and research.[3] Snowflake CEO says people will soon not be able to remember a world without AI.[4] Sources: [1] https://www.pymnts.com/commercial-payments/2023/alibaba-adds-smart-assistant-and-upgraded-image-search-to-b2b-platform/ [2] https://www.thenationalnews.com/lifestyle/fashion-beauty/2023/09/11/collina-strada-ai-new-york-fashion-week/ [3] https://techcrunch.com/2023/09/10/lexisnexis-generative-ai/ [4] https://www.cnbc.com/2023/09/11/snowflake-ceo-says-people-will-soon-not-remember-a-world-without-ai.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    I Caught My AI Looking up Adult Content!
    ​ https://preview.redd.it/l9jkupupaqnb1.jpg?width=1125&format=pjpg&auto=webp&s=ba52e00e32119d34958c480473bfa690484cd085 ​ https://preview.redd.it/w53cjsqqaqnb1.png?width=1125&format=png&auto=webp&s=f750ca0996b94fb530173a8a9c4a2a258e29517a submitted by /u/guh-eye [link] [comments]  ( 8 min )
  • Open

    [P] Need advice for vector DB
    Hi, all, I'm working on a GPT-powered game where the characters speak using API calls. For this, I need an inexpensive vector database that does not require an API, or at least if it does, it leverages the OpenAI API. Also, this vector database must be runnable on consumer-grade gaming hardware with a small search space (let's say 10-50 entries in the DB). Also, I need to package it with the game somehow. My game is in the Godot engine which can use Python with a plug-in. Other approaches I was considering is having a second process communicate with the game through a socket. Ideally the vector DB solution would be easy to install - that is, I could package it with a .exe, and simply run both without the player having to download anything else. Any suggestions? submitted by /u/kettlebot141 [link] [comments]  ( 9 min )
    [D] What are some ways that you can reduce latency of real-time user-user matching?
    Hi. I'm currently tasked with something at my company that I'm facing some difficulty with because it's not in my domain. My company has a service where we provide video chatting to users and match users with others based on various features. Currently I've implemented a simple model where we have separate embedding matrices for each user feature, create a user representation by aggregating these features, and performing regression between two users. The way that regression works is that the final score output from the model would act as a "matching score" and we'll match user A with the highest other user. The problem is that obviously running inference on every single pair of users is very slow and I need to speed this up. Some methods I thought about were to either use a feature store or perform sampling on users so we're not running inference on the entire users, but I'm not sure if this is optimal. Just curious what other people who have tackled problems like these have done and looking for second opinions. Thanks. submitted by /u/Seankala [link] [comments]  ( 9 min )
    [R] NExT-GPT: Any-to-Any Multimodal LLM - National University of Singapore 2023
    Paper: https://arxiv.org/abs/2309.05519 Blog: https://next-gpt.github.io/ My opinion: It lacks a Cognitive Architecture: https://arxiv.org/abs/2309.02427 Also the models are far too small and are more on the gpt-2 level. The idea in itself is a good one but can be far improved with bigger models. I also would like to remember in this that all foundation models could be improved if there would be no tokenizers: https://x.com/karpathy/status/1657949234535211009?s=20 Abstract: While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides, they mostly fall prey to the limitation of only input-side multimodal understanding, without the ability to produce content in multiple modalities. As we humans always perceive the world and communicate with people through var…  ( 9 min )
    [R] Unveiling theory of mind in large language models: A parallel to single neurons in the human brain - Harvard University 2023
    Paper: https://arxiv.org/abs/2309.01660 Abstract: With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dorsal medial prefrontal cortex (dmPFC) neurons, the precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown. In this study, we drew inspiration from the dmPFC neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. Surp…  ( 9 min )
    [D][R] Looking for help with Forced Alignment for translated audio
    Hey everyone, I'm trying to create an alignment between source audio in a different language than the translated transcript. Essentially want to align the translated transcript with the word-level timestamps on an Audio, programmatically. I've tried to find different tools; some open-source ML models force alignment if the source audio and transcript language are the same. My goal is to have audio in a dubbed language, which I generate using a translated transcript that has been originally transcribed from my audio. Alignment seems tough since languages are spoken at different rates, so I'm figuring out the best way to optimize alignment without having to speed up/slow down the audio too much for each sentence. submitted by /u/Revolutionary_Ant944 [link] [comments]  ( 9 min )
    [D] Best Places to Access the Greatest Number of GPUs
    I'm in need of a massive amount of GPUs for batch inference I'm doing. Outside of the big cloud providers are there any niche services out there you'd recommend? submitted by /u/Ok_Post_149 [link] [comments]  ( 9 min )
    [D] Are Fourier Positional Encodings Outdated?
    I gave a talk at work the other day about the attention mechanism and one of my coworkers told me that he thinks Fourier Positional Encodings in transformers are outdated. I've tried to follow up and find what I could but I didn't see anything suggesting that they're not being used. I know that learned encodings are also used. Can anyone give me some direction on this? My initial impression is that they are not outdated by any means, but I'm happy to be wrong about that. submitted by /u/XfrmrTron [link] [comments]  ( 9 min )
    [P] Launched my own TTS/Sound Effect/AI Music Service - looking for people to try
    I've created an AI Sound service that can do TTS (text to speech), STS (speech to speech), Voice Cloning, generate sound effects, and also generate instrumental music. Here's the link: https://voicegen.org/, you can try everything for free. The TTS quality is similar to Elevenlabs, and there are some sample clips on the home page. Stuff I'm working on: - Emotional speech (where you can select the emotion of the TTS). Right now you can already do it by putting the emotion in brackets: e.g. "[Angrily] Please go away!" but I want to make it better. - Music with vocals. Currently the model only generates instrumental music. I am retraining it/tweaking the model to allow for music with lyrics. - Faster Inference: Since I'm doing this all myself and I'm not rich, I don't have access to the best hardware. However, I am working on some optimizations like speculative decoding that should speed things up. Anyways, let me know if you have any questions/comments/feature suggestions/see any bugs! Feel free to DM me. Thanks. submitted by /u/ginger_turmeric [link] [comments]  ( 9 min )
    [P][R] Kani: A Lightweight Highly Hackable Open-Source Framework for Building Chat Applications with Tool Usage (e.g. Plugins)
    Hey all, we just released our new project/paper and we thought you all might find it useful! Our project (Kani) is a super lightweight and hackable alternative to frameworks like LangChain or simpleAIchat meant to help developers hook in callable functions or tools to chat models easily. With Kani, devs can write functions in pure python and just add one line (the @ai_function() decorator) to turn any function into an AI-callable function! Kani works with any model and has built-in tools for OpenAI, HuggingFace, LLaMAv2, Vicuna, and GGML with more to come. Kani also never does any prompt engineering under the hood and doesn't require learning complex library tools---all defaults are minimal and highly customizable. Check out our Colab for mini-examples of things like retrieval, web-search, model routing, etc. https://colab.research.google.com/github/zhudotexe/kani/blob/main/examples/colab_examples.ipynb If you're interested in learning more check out our links below! Paper: https://arxiv.org/abs/2309.05542 GitHub: https://github.com/zhudotexe/kani Docs: https://kani.readthedocs.io/ submitted by /u/zhuexe [link] [comments]  ( 9 min )
    [D] Help Understanding LLM Quantization techniques and how they Relate
    So i have been doing some research to get into the LLM quantization field but have some questions. To better organize my ideas i have developed the image below. Does it make sense / is true? The way i understand it there are 3 main methods which are compatible with different backends (the backend part is still quite confusing to me). What is the core diference between what the methods do and the backends? What are some core diferences between the backends? What is the main distinction between GPTQ and NF4? How does NF4 relate to QLoRa, is it the same or is it just a small part of QLoRa? Thanks in advance and i apologize for any ignorance. ​ https://preview.redd.it/gxpo0ir0yunb1.png?width=1041&format=png&auto=webp&s=872424a58a9d4393c025b8d2cec0160979b035f4 submitted by /u/MiNeves [link] [comments]  ( 9 min )
    [R] Use of GPT-4 to Analyze Medical Records of Patients With Extensive Investigations and Delayed Diagnosis
    Paper - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10425828/ Six patients 65 years or older (2 women and 4 men) were included in the analysis. The accuracy of the primary diagnoses made by GPT-4, clinicians, and Isabel DDx Companion was 4 of 6 patients (66.7%), 2 of 6 patients (33.3%), and 0 patients, respectively. If including differential diagnoses, the accuracy was 5 of 6 (83.3%) for GPT-4, 3 of 6 (50.0%) for clinicians, and 2 of 6 (33.3%) for Isabel DDx Companion. ​ submitted by /u/MysteryInc152 [link] [comments]  ( 9 min )
    [R] Train ViT on small datasets
    Hello, everyone, I'm currently working on a computer vision project using the Oxford Pets Dataset, which consists of 37 different pet categories. I initially used a pre-trained ViT model with ImageNet weights model=vit_b_32(ViT_B_32_Weights.IMAGENET1K_V1), and it gave me an impressive accuracy of 88%. However, I want to modify the architecture of the ViT model and train it from scratch without relying on ImageNet weights. I'm aware that ViT models are data-hungry and that training from scratch (model=vit_b_32( )) can be challenging, especially with limited data. I've already applied data augmentation techniques to enhance my dataset, but I'm still struggling to achieve satisfactory results. My accuracy is currently only 7%. I'd appreciate any advice or tips from the community on how I can improve the performance of my scratch-trained ViT model. Are there any specific training strategies, hyperparameters, or architectural modifications that I should consider? How can I make the most out of my limited dataset to boost accuracy? Thank you in advance for your help! submitted by /u/NoEntertainment6225 [link] [comments]  ( 9 min )
    [P][R] Developing a platform to accelerate the research
    Hi community~ We are developing a platform similar to mTurk and Prolific and plan to do the first wave of hypothesis testing in the coming weeks. If you have open tasks that require large amounts of human intelligence, please reply to this thread or dm me. We can support your research in our hypothesis testing. we are on the mission of helping machine learning experts and AI training as open and public goods, you can learn more here: https://ivynetwork.cloud/ feel free to ask more questions here :) submitted by /u/Accomplished_Code_25 [link] [comments]  ( 9 min )
    Math for machine learning [D]
    İ have a question. How important linear algebra for machine learning? İ have basic level knowledge on linear? Should İ study in more detail? And How can İ follow roadmap on math for machine learning? submitted by /u/Necessary-Car-5080 [link] [comments]  ( 9 min )
    [D][P]Has anyone ever tried fine-tuning Tortoise tts for better voice cloning?
    Hello people. I've been wanting to clone voices along with the accent. For example: A user speak English in an Indian accent should have that accent cloned in the output audio as well. By default, tortoise is not good at doing that. It can clone the pitch of the voice really well but the accent is completely lost. I was wondering if fine tuning the model could get me what I want. Please do suggest. Also do suggest any methods on fine tuning it if it does in fact help my use case. Thanks a lot! Note: I've also tried bark, coqui and vall-e-x. They aren't that good for voice cloning from what I saw. submitted by /u/salehxoxo [link] [comments]  ( 9 min )
    [D] Evaluating a clothing size prediction model
    I’ve been working on a simple ML classifier that predicts the size of a piece of clothing based in user body specifications (such as height, weight, age, etc). As we want to move the model to production, I’m keen on understanding the best strategies to evaluate its performance in a real-world setting. This is a small example of how we would “tag” our recommendations: If the model predicts a size M, and the user buys and keeps it, it’s a correct prediction If the user buys and M and returns it due to size issues, it’s incorrect If the user buys a different size, returns it, and then buys the size initially recommended, it’s counted as correct. Additionally since we cache user input data, when they visit the same product after a while, or if they visit different product pages, they don’t need to re-input. We need to determine if they looked at this recommendation or if they even took it into account, especially if some time has passed since the original recommendation was made. Main questions I have: 1. What scenarios might I be missing when tagging incorrect/correct predictions 2. How would you approach the data tagging issue in this context? 3. What would be the best strategy to determine if a recommendation was considered by the user or if it’s too old to be reliable? I’ve gotten some insights already but would love to hear more perspectives. Any feedback, experiences, or even related research would be much appreciated! submitted by /u/SufficientPepper1801 [link] [comments]  ( 9 min )
    [R] Anyone working on AI systems for the education, recruitment, HR, credit scoring or financial sectors?
    We are working on a research project on how to identify contextual hidden bias in real use cases, by simulating real-world situations where AI systems behave in unexpected biased ways, regardless of how well their models have performed during development with their training and testing datasets. Would love to hear from those, I'm interested to know if their systems had any issues with misbehaviour in post-market use cases and how they are solving/mitigating this. Ps: If anyone is interested to get involved with the wider research, let me know! submitted by /u/Dismal-Might8594 [link] [comments]  ( 9 min )
    [P] AI NPC's are closer than we think. I made a new game!
    Hello r/MachineLearning! After the positive response to Bargainer.ai, I got really excited about the potential of this technology in larger scale video games like World of Warcraft or GTA for example. I'm happy to announce that I'm now releasing - Convince the Bouncer! This time, you chat with an AI Bouncer and try to gain entry to a very exclusive night club. Don't worry; it's fairly easier than getting into Berghain. Try it out here: convincethebouncer.com P.S.: Get the VIP Pass from the Bouncer, and you might access an upcoming AI platform early! :) Questions or ideas? Let me know. Thanks a bunch! submitted by /u/gavo_gavo [link] [comments]  ( 9 min )
    [R] Factors Influencing Adoption Intention of ChatGPT
    Hello, ​ I am an information systems student currently conducting research for my undergraduate thesis on the factors that influence people's adoption intention of ChatGPT, as well as identifying the factors that may be holding them back. These factors include people's concerns about potential negative impacts of ChatGPT, such as increased unemployment and the spread of misinformation. Your participation in this study is crucial as it will provide valuable insights to help us understand how ChatGPT can be improved to meet users' needs. ​ Please note that I am not affiliated with OpenAI, no identifying information will be collected during the survey, and all responses will be kept confidential. The survey should take approximately 10 to 15 minutes to complete, and participation is voluntary. You may withdraw from the survey at any time, and there are no known risks associated with participating. ​ If you are interested in learning more about the study, please follow the link below. ​ https://docs.google.com/forms/d/e/1FAIpQLSf5HIfXHppMuTR63x00i4OuRAtM5Ti6EGybd-HuI1kmK06VPw/viewform?usp=sf_link ​ Thank you for taking the time to contribute to our research study. Your participation is greatly appreciated! submitted by /u/maulanash [link] [comments]  ( 9 min )
    [D] LLM Models for providing troubleshooting suggestions
    I want to train an LLM model on data related to the Container Orchestration Platform Kubernetes. The LLM should be able to identify issues, provide commands, and provide troubleshooting solutions for a given input. What is the best model for doing so and how much data should I have to train the model? submitted by /u/faizanbasher [link] [comments]  ( 9 min )
    [R] Textbooks are all you need II: phi-1.5 technical report
    Arxiv link: Textbooks are all you need II More generally, phi-1.5 (1.3B) exhibits many of the traits of much larger LLMs, both good – such as the ability to "think step by step" or perform some rudimentary in-context learning – and bad, including hallucinations and the potential for toxic and biased generations – encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source phi-1.5 to promote further research on these urgent topics. submitted by /u/PantsuWitch [link] [comments]  ( 9 min )
  • Open

    World scale inverse reinforcement learning in Google Maps
    Posted by Matt Barnes, Software Engineer, Google Research Routing in Google Maps remains one of our most helpful and frequently used features. Determining the best route from A to B requires making complex trade-offs between factors including the estimated time of arrival (ETA), tolls, directness, surface conditions (e.g., paved, unpaved roads), and user preferences, which vary across transportation mode and local geography. Often, the most natural visibility we have into travelers' preferences is by analyzing real-world travel patterns. Learning preferences from observed sequential decision making behavior is a classic application of inverse reinforcement learning (IRL). Given a Markov decision process (MDP) — a formalization of the road network — and a set of demonstration traject…  ( 92 min )
  • Open

    NVIDIA Lends Support to Washington’s Efforts to Ensure AI Safety
    In an event at the White House today, NVIDIA announced support for voluntary commitments that the Biden Administration developed to ensure advanced AI systems are safe, secure and trustworthy. The news came the same day NVIDIA’s chief scientist, Bill Dally, testified before a U.S. Senate subcommittee seeking input on potential legislation covering generative AI. Separately, Read article >  ( 6 min )
    Mobility Gets Amped: IAA Show Floor Energized by Surge in EV Reveals, Generative AI
    Generative AI’s transformative effect on the auto industry took center stage last week at the International Motor Show Germany, known as IAA, in Munich. NVIDIA’s Danny Shapiro, VP of automotive marketing, explained in his IAA keynote how this driving force is accelerating innovation and streamlining processes — from advancing design, engineering and digital-twin deployment for Read article >  ( 7 min )
    A Quantum Boost: cuQuantum With PennyLane Lets Simulations Ride Supercomputers
    Ten miles in from Long Island’s Atlantic coast, Shinjae Yoo is revving his engine. The computational scientist and machine learning group lead at the U.S. Department of Energy’s Brookhaven National Laboratory is one of many researchers gearing up to run quantum computing simulations on a supercomputer for the first time, thanks to new software. Yoo’s Read article >  ( 6 min )
    One Small Step for Artists, One Giant Leap for Creative-Kind
    Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks and demonstrates how NVIDIA Studio technology improves creative workflows.  When it comes to converting 2D concepts into 3D masterpieces, self-taught visual development artist Alex Treviño has confidence in the potential of all Read article >  ( 7 min )
  • Open

    DSC Weekly 12 September 2023
    Announcements Top Stories In-Depth The post DSC Weekly 12 September 2023 appeared first on Data Science Central.  ( 20 min )
    Securing your AI data pipeline with MLOps
    By Colin Priest, Chief Evangelist at FeatureByte Enterprises are increasingly implementing Artificial Intelligence (AI) into their operations. However, AI-ready data pipeline practices are still in their infancy, especially when it comes to IT security. The pervasiveness of “Spaghetti Code” Enterprises delving into AI data pipelines often find themselves wading through a mess of complex and… Read More »Securing your AI data pipeline with MLOps The post Securing your AI data pipeline with MLOps appeared first on Data Science Central.  ( 20 min )
    Data Warehousing: The key to effective marketing campaign management
    Businesses today constantly strive to gain a competitive edge in their marketing efforts.  Leveraging their data effectively to create data-driven campaigns is the best way to trump the competition. One of the best tools at their disposal to utilize their data is a data warehouse. Data warehousing is crucial in enhancing marketing and campaign management… Read More »Data Warehousing: The key to effective marketing campaign management The post Data Warehousing: The key to effective marketing campaign management appeared first on Data Science Central.  ( 21 min )
    Data-driven insights: Improving remote team performance with time-tracking analytics
    The way we work has changed, with remote teams now a common part of the landscape. While remote work offers flexibility, it also brings challenges. Managing remote teams effectively is crucial to ensure productivity and collaboration. In this article, we’ll explore how using time tracking for remote teams can help manage employees’ performance better. Time-tracking… Read More »Data-driven insights: Improving remote team performance with time-tracking analytics The post Data-driven insights: Improving remote team performance with time-tracking analytics appeared first on Data Science Central.  ( 21 min )
    AI for Natural Language Understanding (NLU)
    In the panorama of Artificial Intelligence (AI), Natural Language Understanding (NLU) stands as a citadel of computational wizardry. No longer in its nascent stage, NLU has matured into an irreplaceable asset for business intelligence. In this discussion, we delve into the advanced realms of NLU, unraveling its role in semantic comprehension, intent classification, and context-aware… Read More »AI for Natural Language Understanding (NLU) The post AI for Natural Language Understanding (NLU) appeared first on Data Science Central.  ( 21 min )
    How can IoT transform and benefit the entertainment industry?
    The Internet of Things (IoT) has been transforming entertainment and has given it new ways of creating, delivering and consuming content. The wide-ranging utility of IoT devices has improved user experience while enhancing the safety and security of users. The media and entertainment (M&E) companies can leverage IoT technology to improve the overall quality of… Read More »How can IoT transform and benefit the entertainment industry? The post How can IoT transform and benefit the entertainment industry? appeared first on Data Science Central.  ( 20 min )
    AI and the cyber challenge: Bridging vulnerabilities in modern defense strategies
    In our increasingly interconnected world, the digital realm has become both a frontier of innovation and a battleground of threats. As technology advances, so do the tactics of malicious actors who seek to exploit vulnerabilities in our digital infrastructure. The rapid evolution of cyber threats calls for a paradigm shift in defense strategies, and that’s… Read More »AI and the cyber challenge: Bridging vulnerabilities in modern defense strategies The post AI and the cyber challenge: Bridging vulnerabilities in modern defense strategies appeared first on Data Science Central.  ( 22 min )
  • Open

    Help me with modeling my game (source code review)
    Hi! I am working on the implementation for DQN algorithm for one interesting game. This game is interesting because moves in this game are not affecting state of the game directly, but modify beliefs of other participants of the game and basically allow other agents to deduce role of other players in the game. It's game of "Mafia". Here's are the rules: Mafia Game description: Game is played with 10 players, players are getting roles at random. At the beginning of the game there's 3 players who gets Black cards (1 Don and 2 Mafia) and 7 players get Red cards (6 Citizen card and 1 Sheriff card). One team is playing against each other. Three black players knows each other and red players do not know who is red and who is black. Game is played with phases - "Day" and "Night". Du…  ( 11 min )
    Multi-agent DQN
    Hiii, I have some troubles here. I'm working on a multi-agent setting with three DQN agents each with its observation plus a shared reward. I tried many hyperparameters values, however I got a curve as below. I don't know why there are some sudden drops. Is there anyone could help me please ? https://preview.redd.it/ua30pe963tnb1.png?width=1753&format=png&auto=webp&s=77fcc91cfaf08984a5f03014bdc1bc9b69c2b2a9 submitted by /u/GuavaAgreeable208 [link] [comments]  ( 9 min )
    Adversarial Reinforcement Learning
    A curated reading list for the adversarial perspective in deep reinforcement learning. https://github.com/EzgiKorkmaz/adversarial-reinforcement-learning submitted by /u/ml_dnn [link] [comments]  ( 9 min )
  • Open

    Do interneuron can choose other interneuron in connections to send the signal? Or can’t And send the signal to all inter neuron in his connections .
    submitted by /u/PowerfulGeologist373 [link] [comments]  ( 9 min )
    What are the Components of an Artificial Neural Network?
    Hello all, I have compiled an article including important components that constitute an Artificial Neural Network and the link is here: https://www.enjoyalgorithms.com/blog/components-of-ann It includes the information regarding the components like: Input layer Output layer Hidden layer/s Neurons Connections Fully connected Feed Forward Network Weight Matrix Activation function Loss/Cost function Optimization Algorithm, and finally Parameters https://preview.redd.it/gq57nbbgmrnb1.png?width=1280&format=png&auto=webp&s=dd44bbf8ab1c60acc74933c982b4f86cc5199e06 All these components help in designing Neural Network Architecture to solve any classification and Regression Problem. Please have a read and give your valuable feedback to improve it further. Enjoy Learning! submitted by /u/ravish_kumar_007 [link] [comments]  ( 9 min )
  • Open

    FP2: Fully In-Place Functional Programming provides memory reuse for pure functional programs
    This research paper was presented at the 28th ACM SIGPLAN International Conference on Functional Programming (opens in new tab) (ICFP), a premier forum for discussing design, implementations, principles, and uses of functional programming. Functional programming languages offer a host of advantages, such as ensuring memory safety (opens in new tab) and eliminating arbitrary side effects. […] The post FP2: Fully In-Place Functional Programming provides memory reuse for pure functional programs  appeared first on Microsoft Research.  ( 10 min )
  • Open

    Amazon SageMaker simplifies the Amazon SageMaker Studio setup for individual users
    Today, we are excited to announce the simplified Quick setup experience in Amazon SageMaker. With this new capability, individual users can launch Amazon SageMaker Studio with default presets in minutes. SageMaker Studio is an integrated development environment (IDE) for machine learning (ML). ML practitioners can perform all ML development steps—from preparing their data to building, […]  ( 6 min )
    Unlocking language barriers: Translate application logs with Amazon Translate for seamless support
    This post addresses the challenge faced by developers and support teams when application logs are presented in languages other than English, making it difficult for them to debug and provide support. The proposed solution uses Amazon Translate to automatically translate non-English logs in CloudWatch, and provides step-by-step guidance on deploying the solution in your environment.  ( 6 min )
    Accelerate client success management through email classification with Hugging Face on Amazon SageMaker
    In this post, we share how SageMaker facilitates the data science team at Scalable to manage the lifecycle of a data science project efficiently, namely the email classifier project. The lifecycle starts with the initial phase of data analysis and exploration with SageMaker Studio; moves on to model experimentation and deployment with SageMaker training, inference, and Hugging Face DLCs; and completes with a training pipeline with SageMaker Pipelines integrated with other AWS services  ( 10 min )
  • Open

    AI model speeds up high-resolution computer vision
    The system could improve image quality in video streaming or help autonomous vehicles identify road hazards in real-time.  ( 10 min )

  • Open

    [D] Is there any up to date speech denoising model?
    I have been googling for a few hours now, and all of the solution that I can find are either very complex or not up to date. Ideally I would like to combine this with OpenAI Whisper to clarify the speech audio file and then transcribe it. Any good methods for this? submitted by /u/aszx789 [link] [comments]  ( 9 min )
    [D] Is retrieval necessary/sufficient to solve "hallucinations"?
    Hi everyone, "Hallucinations" seem to be one of the major blockers to the adoption of LLMs, especially in enterprise settings where seemingly plausible but false information/decisions can be disastrous. I am wondering whether or not current LLMs, like GPT4 or Llama 2 70b have reached the reasoning capabilities to be able not to hallucinate when fed the proper information. One can see a LLMs as college / high school students that have some basic reasoning and knowledge but might need to be nudged to answer business / scientific questions. What is your opinion on the topic? Is there for instance, a database of hallucinations, so that one could test that if the model would have answered properly if the right context was fed in the prompt? View Poll submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [R] I am looking for a paper for my research and coming up short, help needed.
    I've consulted all the AIs and all of the search engines I could think of and am still coming up short. I'm pretty sure I just didn't make this up, somewhere between 2005 and 2015 there was a paper where the authors describe a method for using silicon neuron forests to predict traffic patterns. They etched silicon columns onto a wafer and then submerged the wafer in an ionic solution. They sent electrical signals into the column array and recorded the output. As the signal was allowed to interact with the silicon in solution, the columns would begin to form small conductive tendrils or filaments to neighboring pillars. These filaments formed a neural network, which could be used to predict traffic patterns. I'm sure there were at least 2 papers that came out about this subject, but for the life of me I cannot find them. I've spent many many hours looking for at least one of the papers and am hoping for a longshot...that some kind internet stranger comes across this and can point me in the right direction. submitted by /u/Inevitable-Start-653 [link] [comments]  ( 9 min )
    [D] Best Solution for Video Quality Control
    What is the best CNN or method for creating a program that can detect video glitches, artifacts, anamolies and highlight them in an exported video as well as producing a .txt file. I'm currently using YOLOV8 which works a bit but it's over detecting and it may not be scalable. Not sure if it has to do with the size of the dataset. Right now, I'm gathering my own dataset, but would love to know if there is one that already exists. Looking for all and any recommendations. Thank you. submitted by /u/icetyche [link] [comments]  ( 9 min )
    [D] SVC/RVC tips for inferencing low quality audio?
    Please let me know if there is a better sub for this! I trained a local voice model of my father, who recently passed away. He was a musician, and I found a handful of songs he had recorded in the 80's. Unfortunately but I only have a copy coming from an audio cassette, which I've digitized. There may be a copy on open reel somewhere in our basement but I haven't found it so far. Any, because dad was always writing and recording new songs, my training dataset consists of all original, high quality vocal stems at 96khz/24bit, spanning at least 20 years and consisting of about 30 ish minutes of audio. I also tried starting over but with a speech he gave as additional training, coming out to around 55 minutes of audio. I wanted to use his voice model to essentially remaster his original vocals from the cassette audio, and rerecord the rest of the instruments myself. I isolated the vocals using UVR (also tried mdx23), and tried to clean it up further as much as possible. The RVC vocals comes out alright, but due to the low quality of the audio, it seems to mess up certain words, for example "free" comes out as "fee". The SVC vocals came out with a closer tonality, but had many more artifacts, and pitch had a tendency to go up/down 2 octaves at the end of certain phrases. When using a higher quality sample (like one of my own vocal tracks), these issues aren't so prevalent, although rvc seems to create more of a hybrid sounding timbre than svc, blending the original voice + model rather than just the sound of the model. Happy to post samples of input/output audio, let me know! tl;dr high quality training data, low quality inference audio. Voice model is the same voice as input audio, 1. tips for making the best of what I've got? 2. any way to use text along with input audio and maintain original vibrato? 3. any way to train a UVR/MDX model using a particular voice in order to better isolate that person? thanks :) submitted by /u/bbmaster123 [link] [comments]  ( 10 min )
    [R] Cognitive Architectures for Language Agents - Princeton University 2023
    Paper: https://arxiv.org/abs/2309.02427 Github: https://github.com/ysymyth/awesome-language-agents Twitter: https://twitter.com/ShunyuYao12/status/1699396834983362690 Abstract: Recent efforts have incorporated large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning. However, these efforts have largely been piecemeal, lacking a systematic framework for constructing a fully-fledged language agent. To address this challenge, we draw on the rich history of agent design in symbolic artificial intelligence to develop a blueprint for a new wave of cognitive language agents. We first show that LLMs have many of the same properties as production systems, and recent efforts to improve their grounding or reasoning mirror the development of cognitive architectures built around production systems. We then propose Cognitive Architectures for Language Agents (CoALA), a conceptual framework to systematize diverse methods for LLM-based reasoning, grounding, learning, and decision making as instantiations of language agents in the framework. Finally, we use the CoALA framework to highlight gaps and propose actionable directions toward more capable language agents in the future. https://preview.redd.it/09kdff4sdonb1.jpg?width=1276&format=pjpg&auto=webp&s=7aaa6d59d602f7e9ab124c812bbfa6bba5b7373b https://preview.redd.it/6cly0e4sdonb1.jpg?width=1277&format=pjpg&auto=webp&s=5a5164b84af5f828a668560acb64e5c579693d1f https://preview.redd.it/mvatjf4sdonb1.jpg?width=1277&format=pjpg&auto=webp&s=7c3a039db3a7e0f3de38f761f8aefa1c8d331ae5 https://preview.redd.it/bj5wdj4sdonb1.jpg?width=1270&format=pjpg&auto=webp&s=473b273ae0097aaa51d6578e9c5e3b9c953cc421 https://preview.redd.it/501cnf4sdonb1.jpg?width=1578&format=pjpg&auto=webp&s=c8b1762ac28c89f8ac2f8d4fb6d9ecae06491c3e ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] do you use pycharm for machine learning?
    For learning ann, cnn I used Google colab. But now for learning rnn I have decided to use ide(a fan of jetbrains). But even if I change one line of code who code recompiles. This wasn't case for colab. Is there a feature in pycharm I don't know(I learned python through text so don't know much about ide). I am a newbie. submitted by /u/Coc_Alexander [link] [comments]  ( 9 min )
    [D] Using AI to navigate the complexities of regulatory frameworks
    I would be interested in hearing opinions for using AI for regulatory assurance and compliance in regulated industries, what are your thoughts? Explanation: An AI-driven compliance system ensuring adherence to evolving regulations, minimizing risks, and enabling businesses to operate confidently within legal boundaries. Pairing Large Language Models (LLMs) with blockchain technology to offer a range of benefits, particularly in the context of regulatory compliance. LLMs, powered by advanced natural language processing and machine learning capabilities, can enhance regulatory compliance processes in several ways. Firstly, they can automate the analysis of regulatory documents, helping businesses stay updated with evolving compliance requirements. LLMs can also assist in generating compliance reports, simplifying complex legal language into understandable summaries. Furthermore, by integrating LLMs into smart contracts, businesses can ensure that contract terms adhere to regulatory guidelines automatically. The integration of LLMs with blockchain can significantly improve regulatory compliance by automating document analysis, simplifying legal language, monitoring compliance in real-time, and enhancing customer interactions—all contributing to greater efficiency and accuracy in adhering to regulatory standards. I have a whole technical whitepaper with this stuff on hand, if anyone would like to review it let me know.. submitted by /u/cryptobooty_ [link] [comments]  ( 9 min )
    [R] A surprisingly effective way to predict token importance in LLM prompts
    Hey folks, we explored a novel method to gauge the significance of tokens in prompts given to large language models, without needing direct model access. Essentially, we just did an ablation study on the prompt using cosine similarity of the embeddings as the measure. We got surprisingly promising results when comparing this really simple approach to integrated gradients. Curious to hear thoughts from the community! Here are links to the demo and blog post submitted by /u/shayanjm [link] [comments]  ( 9 min )
    [D] I am looking for an authoritative, consistent and complete description of autodiff.
    I am currently trying to learn about how autodiff is used in gradient calculations. In the all sources I've came across, none can explicitly point to an authoritative, consistent or complete source on autodiff. I don't need examples of autodiff, I just need the full, generalized algorithm laid out. For example: I open this link: https://w3.cs.jmu.edu/spragunr/CS445/lectures/autodiff/autodiff.pdf which simply provides autodiff applied on an example, but not the description of the algorithm. The author of that link says if I need any more info, I should go to this other link: https://stats.stackexchange.com/questions/224140/step-by-step-example-of-reverse-mode-automatic-differentiation/235758#235758 But this link doesn't even have topological sort as part of the operation. Therefore these descriptions of the autodiff is inconsistent and leaves me not knowing who to trust. Can someone point to some original paper on autodiff or a single source that describes this algorithm fully? I note here that this situation is completely different from backpropagation. The full backpropagation algorithm is impeccably laid out in peer-reviewed text books such as Learning from Data by Mustafa et al. and Optimization textbook by Chong and Zak. Furthermore, the algorithm defined in these two books are completely consistent with one another. submitted by /u/fromnighttilldawn [link] [comments]  ( 9 min )
    [R] AI Model for Cancer Origin Detection
    https://preview.redd.it/14fj73aw2nnb1.jpg?width=1200&format=pjpg&auto=webp&s=e94612b3dec5f7b9f71490ad5a60ced73fb3373d Researchers at MIT and the Dana-Farber Cancer Institute have developed an innovative AI-powered model called OncoNPC, designed to assist in identifying the origin of tumors in cancer patients where the primary site is unknown. For a small percentage of cancer patients, pinpointing the origin of their cancer can be incredibly challenging, making it difficult to select the most appropriate treatment, as many cancer drugs are designed for specific cancer types. Using machine learning, the researchers created OncoNPC, a computational model capable of analyzing the genetic sequences of approximately 400 genes. This model, based on genetic data routinely collected at Dana-Farbe…  ( 11 min )
    [R] Locally Hosted AI Text Model
    Hello, I want to start a small experiment in my company and install an AI locally on my computer, extra only locally, so no company data can be stolen. The AI should be able to summarize large PDFs. Which textmodel can you recommend me, which is local and not too big? submitted by /u/DesNutella [link] [comments]  ( 9 min )
    [D] neural network designed for model selection
    Hi all, So, I normally work in a different field (macroeconomics/econometrics) and I have been working on a slightly different project for a bit because a change of pace is fun every once in a while. The problem is I am running out of good ideas on where to continue and i would appreciate any input! The problem set up is as follows: I am building a second draft for a forecasting toolbox based on a custom ARIMA framework. The problem with this type of exercise is always model selection. For this project I focus on in-sample criterions as the data has very few time periods overall. The typical strategy to find a decent model is to make an initial guess (i.e., a constant model, a random walk model, etc.) and then run a stepping algorithm that probes the model space and, ideally, it coverg…  ( 10 min )
    [N] Seeking Beta Testers for Qwak's New Vector Store Feature: Revolutionize Your Vector Data Management!
    Hey Redditors, We're thrilled to announce a new feature from Qwak.ai—Vector Store—and we're on the hunt for beta testers to help us make it even better. 🔍 What's Vector Store? Vector Store is a next-level solution for managing vector data. It's designed to empower organizations to harness the power of vector search on their own datasets. Here's what it offers: 🔄 Automated Data Ingestion: Schedule jobs to pull data from databases like Snowflake, BigQuery, and RedShift. 🎯 Easy Vector Search: Use our Python SDK or REST API to search, upsert, or delete vectors effortlessly. 🛡️ Secure Storage: Your vectors are stored securely and are always accessible when you need them. 📖 Learn More About Vector Store 🤝 Why We Need Beta Testers We're keen to gather insights on usability, performance, and any bugs that might pop up. 📝 How to Get Involved Interested in being a part of this? Comment below or shoot us a DM. submitted by /u/Practical-Lecture733 [link] [comments]  ( 9 min )
    [D] Text segmenting using Spacy and BERT
    Hi. I need to segment some short strings, they are about 100 tokens in length (BERT tokens), or 20-40 words. 1 segment per text. I'm currently using Spacy first, and if it fails, then use BERT. It's alright, but BERT is not really up to par to what I hope for. I'm wondering if there is some better use, as this is my first NLP project. I've trained the huggingface BertForTokenClassification to label the text as either part of segment, or not part. So binary token classification, then smooth out outliers and pick the longest segment of each text. Each string has only 1 segment. I've trained the BERT with 500 examples. I can easily make more train data, though training on my laptop takes time. If there are better/alternative approaches, I'd love to hear them. Regex rules etc. don't apply, which is why I'm using DL models. Especially I feel like I should segment the whole segment at a time, not by token. submitted by /u/Infamous-Bank-7739 [link] [comments]  ( 9 min )
    [D] Appreciation post for Folktables datasets
    I want to take a second to express my appreciation for the Folktables datasets! Folktables is a Python package that contains datasets derived from US Census data. The datasets cover topics about income, employment, health, transportation, and housing. They are quite useful for studying the effects of distribution shifts on ML models. For example, one could design experiments to: 1. Study the model's performance under geographic distribution shifts: Each prediction problem in Folktables can be instantiated with data from every US state. So one could use Folktables to study questions around geographic distribution shifts. For example, we can train a classifier using data from California and then evaluate it on data from Michigan. 2. Study the model's performance under temporal distribution shifts. Folktables contains data for several years, which in itself constitutes a form of temporal distribution shift. So, we can train a classifier using employment data from California in 2014 and evaluate how its equality of opportunity violation or accuracy varies over time. Finding non-synthetic (and open-access) datasets that exhibit these behaviors is so hard! Yet, it's quite easy to encounter them in production environments. 🫠 So, big kudos to the UC Berkeley and Toyota Research Institute research teams for crafting these datasets. Folktables Python package: https://github.com/socialfoundations/folktables Link to paper where Folktables was introduced: https://arxiv.org/pdf/2108.04884.pdf submitted by /u/santiviquez [link] [comments]  ( 9 min )
    [R] Problem with understanding Extended Kalman Filter
    Hey, I'm working on Attitude-Heading Reference System, and I would like to make it with EKF. My approach is to use Euler angles for computing [roll, pitch, yaw] matrix. I read a book "Small unmanned aircraft: theory and practice" and watched few videos on YouTube that are reffering to this particular book. But there is a thing in correction step that I do not fully understand. To be clear, the correction looks as follows: x = x_(t-1) + K*(z - h(x)) Both in the book and in videos, the state vector "x" is [roll, pitch]. But "z" and "h(x)" are the actual and predicted accelerometer readings [ax, ay, az]. So it looks to me, that they try to correct prediction of angles with readings in acceleration: [roll, pitch] = [roll, pitch]_(t-1) + K*[ax, ay, az]. What am I missing? submitted by /u/Skrz_ [link] [comments]  ( 9 min )
    [P] Whisper Large Benchmark: 137 DAYS of Audio Transcribed in 15 Hours for Just $117 ($0.00059/min)
    We recently benchmarked whisper-large-v2 against the substantial English CommonVoice dataset on a distributed cloud (SaladCloud) with consumer GPUs. The Result: Transcribed 137 days of audio in 15 hrs for just $117. Traditionally, utilizing a managed service like AWS Transcribe would set you back about $10,500 for transcribing the entirety of the English CommonVoice dataset. Using a custom model? That’s an even steeper $13,134. In contrast, our approach using Whisper on a distributed cloud cost just $117, achieving the same result. The Architecture: Our simple batch processing framework comprises: Storage: Audio files stored in AWS S3. Queue System: Jobs queued via AWS SQS, with unique identifiers and accessible URLs for each audio clip. Transcription & Storage: Post transcript…  ( 10 min )
    Need help with CNN data format for genomics [R]
    Hello, I wrote CNN using tensor flow to predict phenotypes (cognitive impairment (0/1) and language ability (0-3)) from .vcf files. I transformed the .vcf into tabular format and after determining which columns to keep, I have Chromosome Position Mutation (taken from ref/alt columns) Genotype And merged it with the phenotype data so basically every row has the cog and language scores. I feel like this is a bad way to go about doing this since the model is likely predicting cog/language scores for each mutation rather than each patient as a whole. Am I wrong? How can I fix this so it’s more of a composite of all mutations per subject that the model trains on? Thanks! submitted by /u/Pristine_Ingenuity49 [link] [comments]  ( 9 min )
    [D] Automatic split a video into chapters?
    Given a video with audio, we can use ASR to get a script of the sentences and timestamps. We are looking for a way to group the sentences into chapters. There are several companies that are doing it nicely - Google on YouTube, Assembly AI, but we couldn't find any good resource or paper that explains the research behind how they do it. BertTopic seems to give us just a topic for each sentence, but not a way to split the video. it also doesnt account for timestamps. Wondered if anyone has any links or any other ideas? Thanks very much! Lior submitted by /u/liormessinger [link] [comments]  ( 9 min )
    [P] Data structures for large sequences
    Hi everyone I've been working for quite some time on this project and any feedback will be greatly appreciated. Basically, I've been testing different data structures for large sequence prediction and clustering. Mainly SARSCov2 viral sequences due to its availability. At the moment, I have published two preprints https://www.researchsquare.com/article/rs-2797280/v3 https://www.researchsquare.com/article/rs-1691291/v1 and a general summary of the findings can be found here. https://github.com/TavoGLC/SARSCov2Solar https://www.kaggle.com/code/tavoglc/a-computational-description-of-sarscov2-adaptation I've tried to publish it a couple of times with no success and no comments regarding its accuracy or any potential problems. I hope you guys can check it out and provide some feedback if possible. Just for full transparency, I'm trying to raise funds to further develop those techniques. Donations are extremely welcomed but not encouraged at the moment, just disclosed for transparency. submitted by /u/TavoGLC [link] [comments]  ( 9 min )
    Using LLMs to Analyze and Extract Insights from Device Logs [P]
    I work with device logs that are massive text files, filled with data that's hard to go through manually. I'm primarily interested in extracting specific events or insights, such as security incidents or device malfunctions. The conventional method would be to use regular expressions to filter out relevant information since the logs are structured, but I'm curious about leveraging Large Language Models for this task. I've experimented a bit with zero-shot learning for text summarization but didn't get satisfactory results. Before I invest more time into fine-tuning an LLM, I'd love to hear from anyone who has experience or advice on how to approach this problem. Could LLMs potentially make the process more efficient and effective? Any pointers or suggestions would be greatly appreciated. submitted by /u/Practical_Mango_8720 [link] [comments]  ( 9 min )
  • Open

    I made another AI game, the future of NPCs!
    Hello, fellow AI enthusiasts! After the positive response to Bargainer.ai, I got really excited about the potential of this technology in larger scale video games like World of Warcraft or GTA for example. I'm happy to announce that I'm now releasing - Convince the Bouncer! Chat with an AI Bouncer and try to gain entry to Elysium, the most elite night club. Don't worry; it's fairly easier than getting into Berghain. Give it a spin here: convincethebouncer.com P.S.: Get the VIP Pass from the Bouncer, and you might access an upcoming AI platform early! :) Questions or ideas? Let me know. Thanks a bunch! submitted by /u/gavo_gavo [link] [comments]  ( 9 min )
    Vote bot
    Any good recommendations for a vote bot to cast votes for a insignificant online poll? It’s a no security website. Just refresh and hit a vote button. submitted by /u/fa6664 [link] [comments]  ( 9 min )
    🤖Inside Tencent Hunyuan, Ant's Financial LLM, and Zhipu AI's Rising Valuation
    submitted by /u/trcytony [link] [comments]  ( 9 min )
    We Polled Different Audiences on the Simulation Trilemma. Techies Favor Simulation (50-81%), Others Bet 0%. Are We Overestimating Simulation Probability, and Why?
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
    Best current long-form text summarizers?
    I check every now and then. I use some good ones that can summarize a short YouTube video, or a brief-ish article. But I really want something that can chomp down a whole book and distill its essence without making "summaries of summaries", which tend to increase inaccuracy and errors. A good summary is concise and precise, and I want flexibility with bullet points and level of detail. Having issues with ChatGPT-based tools' token limits, and some that purport to support GPT4 (I could use that "advanced reasoning") but but have to fall back to GPT3 for various errors and reasons. So I'm open to Claude-based ones (may be too early, mo' tokens) and other proprietary engines. What is everyone using, including paid (rightfully so if they offer value for the money) services? submitted by /u/Torley_ [link] [comments]  ( 9 min )
    Notes app doodles to images for architecture design concept iterations using ControlNet and SDXL
    submitted by /u/Alternative_Lab_4441 [link] [comments]  ( 8 min )
    AI WILL TAKE OVER THE WORLD. Meanwhile the AI...
    submitted by /u/Bananas8ThePyjamas [link] [comments]  ( 8 min )
    Meta is reportedly working on a new AI model to rival GPT-4
    Meta is working on a new AI model to rival GPT-4. The company is acquiring AI training chips and building data centers to create a powerful chatbot. CEO Mark Zuckerberg wants it to be free for companies to create AI tools with. Meta is assembling a group to build the model and speed up the creation of AI tools that can emulate human expressions. There are rumors of generative AI features and the launch of AI 'personas' this month. Source : theverge.com submitted by /u/NuseAI [link] [comments]  ( 9 min )
    If AI becomes conscious, how will we know? | "Scientists and philosophers are proposing a checklist based on theories of human consciousness"
    submitted by /u/Tao_Dragon [link] [comments]  ( 9 min )
    Help me in finding right resources to understand the world of AI from a business perspective
    I’m a Business generalist who worked with multiple tech led businesses and looking to understand fundamentals of the technology from scratch. Please help me with any relevant courses/reading material/YT channels etc. that can help me kickstart the journey. P.S. I have a brief background with Business Analytics but haven’t done any coding extensive ever in my life. Thanks in Advance submitted by /u/Firm_Brother_7124 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/10/2023
    Chinese big tech Tencent has announced a medical large-scale model and unveiled a brand new matrix of AI products for various scenarios, including intelligent Q&A, a family doctor assistant, and a digital medical imaging platform. The drug discovery platform “Yunshen” (iDrug) has also revealed a new protein structure prediction algorithm framework called “tFold.”[1] Morgan Stanley is gearing up to implement an artificial intelligence-driven chatbot, a strategic move aimed at delivering valuable insights and administrative support to their team of financial advisors.[2] A fresh Russian AI bot has displayed larger potential than the most famous chatbot created by US-based OpenAI, IT giant Yandex alleged in remarks to RIA Novosti on Saturday.[3] Meta is developing a new, more powerful AI system, Wall Street Journal reports.[4] Sources: [1] https://drug.ai.tencent.com/en [2] https://voonze.com/morgan-stanley-introduces-ai-powered-chatbot-for-enhanced-services/ [3] https://menafn.com/1107040379/Russian-AI-bot-shows-larger-potential-than-ChatGPT [4] https://www.reuters.com/technology/meta-is-developing-new-more-powerful-ai-system-wsj-2023-09-10/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Meta plans to match OpenAI's GPT-4 with its new AI model
    Citing GPT-4 as the benchmark, Meta is reportedly gearing up to train a new, highly sophisticated AI model. The company is investing heavily in AI training chips and boosting its data centers to support the ambitious project. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/ts4a6reeuinb1.jpg?width=1440&format=pjpg&auto=webp&s=940be18b0c5f13e2762e1664a96274b314fe00df Meta’s vision for its new AI model Meta's aim is to create a powerful chatbot in line with OpenAI’s GPT-4 capabilities. The company has reportedly been in pursuit of Nvidia H100 AI-training chips and is amplifying its infrastructure. The idea is to independently train its new model without outsourcing to platforms like Microsoft's Azure. Efforts and roadblocks Meta plans to begin the training of this LLM early in 2024, emphasizing free access for companies. Despite its grand vision, Meta has encountered obstacles like researcher attrition and contentious resource allocation amidst multiple LLM projects. Noteworthy is the intense competition from major players such as Apple, Google, and Amazon to integrate widely generative AI in their user interface. Broader implications While OpenAI has not immediately revealed plans for a GPT-5, other tech giants are investing heavily. Apple's investment in its "Ajax" AI model signifies the increasing race to advanced AI. This move by Meta represents the ongoing trend of tech conglomerates expanding in the AI space, a fact revealed through Google and Microsoft’s use of AI in their productivity tools and Amazon's ongoing developments. (source) P.S. If you like this kind of analysis, I write a free newsletter that explores the latest AI developments. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    Luiza GPT Assistant - virtual girlfriend or boyfriend based on neural network ChatGPT and Telegram
    Luiza GPT Assistant is a simple virtual assistant that mimics your friend, girlfriend or boyfriend, based on neural network ChatGPT and Telegram. Get unique good morning wishes, goodnight, compliments or just chat. https://github.com/r57zone/LuizaGPTAssistant submitted by /u/r57zone [link] [comments]  ( 9 min )
  • Open

    Hyperellipsoid surface area
    Dimension 2 The equation for the perimeter of an ellipse is where a is the semimajor axis, e is eccentricity, and E is a special function. The equation is simple, in the sense that it has few terms, but it is not elementary, because it depends on an advanced function, the complete elliptic integral of the […] Hyperellipsoid surface area first appeared on John D. Cook.  ( 6 min )
    Solve for ellipse axes given perimeter
    I posted some notes this morning on how to find the perimeter of an ellipse given its axes. The notes include a simple approximation, a better but more complicated approximation, and the exact value. So given the semi axes a and b, the notes give three ways to compute the perimeter p. If you are […] Solve for ellipse axes given perimeter first appeared on John D. Cook.  ( 5 min )
    Possible and actual football scores
    The home team lost in a new way yesterday. The Baltimore Ravens beat the Houston Texans by 25-9. This was the first time that score has been seen in the NFL. Possible individual team scores How many scores are possible? It is possible to score any number of points except 1. You can score 2 […] Possible and actual football scores first appeared on John D. Cook.  ( 5 min )
  • Open

    Falcon 180B foundation model from TII is now available via Amazon SageMaker JumpStart
    Today, we are excited to announce that the Falcon 180B foundation model developed by Technology Innovation Institute (TII) is available for customers through Amazon SageMaker JumpStart to deploy with one-click for running inference. With a 180-billion-parameter size and trained on a massive 3.5-trillion-token dataset, Falcon 180B is the largest and one of the most performant models with openly accessible weights. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Falcon 180B model via SageMaker JumpStart.  ( 14 min )
    Amazon SageMaker Domain in VPC only mode to support SageMaker Studio with auto shutdown Lifecycle Configuration and SageMaker Canvas with Terraform
    Amazon SageMaker Domain supports SageMaker machine learning (ML) environments, including SageMaker Studio and SageMaker Canvas. SageMaker Studio is a fully integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models, improving […]  ( 10 min )
  • Open

    How can in log the console verbose to an xslx file every game played in Stable Baselines 3?
    I want to log information like this: ------------------------------------ | rollout/ | | | ep_len_mean | 48.1 | | ep_rew_mean | 2.71 | | time/ | | | fps | 452 | | iterations | 1000 | | time_elapsed | 11 | | total_timesteps | 5000 | | train/ | | | entropy_loss | -0.67 | | explained_variance | -32 | | learning_rate | 0.0007 | | n_updates | 999 | | policy_loss | -0.0567 | | value_loss | 0.0158 | ------------------------------------ to an excel file. Here is my main code: gymnasium.env = snakeEnv() # data_manager = snakeEnv.get_data_manager() # Create an A2C model model = A2C("MlpPolicy", gymnasium.env, verbose=1, device="cuda") # Train the model (replace 'total_timesteps' with appropriate values) model.learn(total_timesteps=100_000, log_interval=1000) Here is the relevant code in my agent file: class snakeEnv(gym.Env): def __init__(self): super(snakeEnv, self).__init__() # Define action and observation space # They must be gym.spaces objects # Example when using discrete actions: self.initNames() self.action_space = spaces.Discrete(3) self.gameCount = 0 self.record = 0 self.reward = 0 self.score = 0 self.game = SnakeGameAI(self.selectedChallenge) self.observation_space = spaces.Box(low=-1000, high=1000, shape=(11,), dtype=np.uint8) def step(self, action): self.reward, self.done, self.score = self.game.play_step(action) self.observation = self.getState(self.game) self.info = {} return self.observation, self.reward, self.done, self.info def reset(self): self.gameCount += 1 self.data_manager.logData(self.gameCount, self.score, self.record, self.reward, self.game.getDeathReason(),self.game.getHeadPos()) self.game.reset() observation = self.getState(self.game) if self.score > self.record: self.record = self.score return observation It would be nice to be able to log the data in the reset function. I know how work with xslx files, the main things is just being able to get the model data. submitted by /u/MrHank2 [link] [comments]  ( 9 min )
    Mid turn actions
    Hello everyone! I want to develop a DRL agent to play a turn-based 1v1 game and I'm starting to plan how to handle things in the future. One potential problem that I thought of is that there is a possible mid turn one-sided decision. An abstraction of the game would be like: There are two players: player A and player B. At the start of each turn, each player chooses an action between 3 possible actions. If player A chose a specific action (let's say action 1), the game asks player B to make a decision (let's say block or not block) and vice versa. Actions are calculated. Next turn starts. What would be a good approach to handle that? I thought of two possible solutions: 1. Anticipate the possibility of that mid turn decision beforehand adding a new dimension to the actions space (e.g. take action 3; if opponent takes action 1, block). That sounds that it could create credit assignment problems e.g. giving credit to the second action when it actually didn't happen. 2. Make two policies with shared value functions. That sounds complicated and I saw that previous works like DeepNash actually did that, but I don't know what problems could arise from that. Opinions/suggestions? Thanks! submitted by /u/victorsevero [link] [comments]  ( 9 min )
    Help with continuous action spaces
    Newbie here. How does an continuous action space look like? E.g. The agent should choose an integer between 1 and 10. With discrete action space it could look something like this: def step(self, action): if action == 0: self.chosenNumber = 1 ... if action == 9: self.chosenNumber = 10 how would this look like with an continuous action space? submitted by /u/ChampionshipWhole467 [link] [comments]  ( 9 min )
  • Open

    NVIDIA Grace Hopper Superchip Sweeps MLPerf Inference Benchmarks
    In its debut on the MLPerf industry benchmarks, the NVIDIA GH200 Grace Hopper Superchip ran all data center inference tests, extending the leading performance of NVIDIA H100 Tensor Core GPUs. The overall results showed the exceptional performance and versatility of the NVIDIA AI platform from the cloud to the network’s edge. Separately, NVIDIA announced inference Read article >  ( 7 min )
  • Open

    System combines light and electrons to unlock faster, greener computing
    “Lightning” system connects photons to the electronic components of computers using a novel abstraction, creating the first photonic computing prototype to serve real-time machine-learning inference requests.  ( 9 min )

  • Open

    [N] Meta Is Developing a New, More Powerful AI System as Technology Race Escalates
    submitted by /u/hzj5790 [link] [comments]  ( 9 min )
    [D] Data Extraction using fine-tuned LLM?
    Hey Reddit, I'm working on a tool to pull data from highly irregular Excel files. I've gotten reasonable results which is extremely fast with standard Python coding, but it's far from perfect due to the lack of standardized templates. Interestingly, when I tested ChatGPT-4 on a sample table, it did a decent job at data extraction. However, relying solely on GPT-4 has its downsides like token limits and slow processing speed (and data privacy issues). Plus, splitting the Excel sheet to fit within these limits results in loss of context and data. I'm considering fine-tuning a language model to post-process data that was in a Pandas DataFrame (perhaps converted to JSON). Has anyone had success with this approach or have alternative recommendations? I've tried Langchain, but it wasn't helpful. I have figured out to extract the relevant columns, but the post-processing part is where I am considering using an LLM which understands the domain and what needs to be extracted based on the examples I feed it. Looking forward to your thoughts! And would be happy to answer any additional questions. submitted by /u/rs35plus1 [link] [comments]  ( 9 min )
    [D] Should I transfer all my work to PyTorch already?
    I've been using Tensorflow since 2017. I know it wasn't ideal or easy back then, but as an early adopter, I became very proficient with it and it has improved a lot since then. I have developed and deployed many custom models in low-level TF, both with and without utilizing the Keras abstractions. I am very comfortable with it in general. But I'm noticing now that Pytorch is gaining more popularity, all the younger practitioners, who got into deep learning within the last 3-5 years, are Pytorch adopters. I've also heard rumors that even googlers are also abandoning TF. I started playing around with Pytorch and as a TF expert, I couldn't help but getting annoyed at how far it is lagging behind in many abstractions and optimizations. I know things are getting better now with the Pytorch 2.0 and introducing some optimization such as the "compile" functionality, but still many of the pytorch project tools remain in beta such as Torchtext and I find many things very annoying, such as having to set the device and pass it on to layers if you want GPU acceleration, having to install Torchtext and other processing libraries separately, or having to use a Dataloader and the limited data type supports for torchdataset. Most people who have not mastered Tensorflow would not relate to my annoyance. Anyhow I'd really prefer to stay within my comfort zone and continue to develop and improve in TF, but if TF is dying, then I better not to, right? So should I convert? Is it indeed dying? submitted by /u/DieselZRebel [link] [comments]  ( 9 min )
    [D] This is my first blog on medium about Machine Learning please have a look and show some love
    submitted by /u/indusop [link] [comments]  ( 9 min )
    [D] Bayesian_Net: A neural network for Bernoulli naive Bayes classification
    What do you think of my recent work? https://github.com/jacobmcasey/bayesian_net At its core, its a neural network for Bernoulli naive Bayes. It uses a 3-layer neural network in NumPy for predicting priors of Bernoulli Naive Bayes Would love to get your feedback on this classifier project! submitted by /u/Ok_Grape_3670 [link] [comments]  ( 9 min )
    [D] How to solve loss spikes in pre-training?
    It happens on and off and I have tweaked many hyperparameters but nothing seems to work significantly better. Is there a recommendation on what to check/tweak? submitted by /u/MrAaronW [link] [comments]  ( 9 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [D] Pdf text to speech
    Hey, I would like to listen to my PDFs I got. Week would the best / easiest way to get an mp3 from my pdfs with good voice? I got a rtx 3070 to run it locally. submitted by /u/Independent_Hyena495 [link] [comments]  ( 9 min )
    [D] Naive pipelining of LLM inference across multiple small GPUs? (self.MachineLearning)
    For transformer inference, is it ok to pipeline blocks/layers to multiple low-memory GPUs (lower blocks to GPU-A and upper blocks to GPU-B) ? A to B bandwidth should be relatively low, and each GPU needs half the model memory. This increases inference latency - fine for our use case... Not sure how this would affect the KV cache ? The excellent Lil'Log article suggest I read the training optimization article, which has this image which is for training but not for inference submitted by /u/yazriel0 [link] [comments]  ( 9 min )
    [D] Best architecture for prediction logging in production
    I am in the process of setting up the first iteration of model monitoring in production. Models are currently served through torchserve in kubernetes (on GCP). In the pasts few years I've been mainly using Vertex AI off-the-shelf tools. Ideally, I'd like to store the following data: Request input. Model prediction. User feedback/groud truth (this might come at a later time). Anything that I might be missing out. This would enable me to implement a wide array (either using libraries like whylogs or evidently that have a lot of the reporting side of things already baked in) of cheks for drift and model degradation. I am wondering what would be the best set up to achieve this. This is currently what I came up with: Save a payload at inference time containing all the fields descr…  ( 10 min )
    [D] HF accelerate vs native pytorch autoscaling for mixed precision training
    I want to start using mixed precision in my training, particularly for CV with high-resolution images. HF accelerate seems quite popular nowadays and looks nice. However, in the past I've invested in learning things like Pytorch-Lightning which look good in a minimal example, but actually add more annoyance than they're worth. Pytorch also can do these things, and the boiler plate doesn't look worse at a glance: https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/ https://huggingface.co/docs/accelerate/index ​ Any experiences with either? Cheers! ​ submitted by /u/AuspiciousApple [link] [comments]  ( 9 min )
    [D] Using LLMs to build Evaluation Sets
    Hi. Is this really better practice than having human linguists build evaluation sets for domain-specific data? submitted by /u/throwaway34334534 [link] [comments]  ( 9 min )
    [P] RLHF and Its Alternatives
    submitted by /u/seraschka [link] [comments]  ( 9 min )
    [P] Open-source Python package for Exploratory Data Analysis for modern NLP applications looking for contributors.
    If you're working on NLP and you're serious about understanding your data, check out Wordview. Wordview is an open-source Python library designed to make Exploratory Data Analysis (EDA) for text for modern NLP applications simpler and more efficient. It consolidates various analysis tools under one roof —think document metrics, multi-word expressions, language identification, POS tags, and even bias analysis. We're looking for contributors to help us expand and refine its capabilities. Join us in making NLP data analysis easier and more insightful with Wordview! Here is the link to the GitHub page: https://github.com/meghdadFar/wordview If you just want to use Wordview without contributing, you're very welcome too. Note that it's pretty new and we are still testing things. Please hence feel free to report bugs and send us your feedback and opinions. Looking forward! submitted by /u/SyntaxTreeHugger [link] [comments]  ( 9 min )
    [D] Question about dealing with EEG inter-subject Variability when training ML models
    Hi, I'm an undergrad student working on a machine learning project about motor imagery classification for BCI. One requirement for the project is that I need to implement LIME an explainable AI tool on my model. I ran into a problem rather quickly as my model accuracy quickly decreases as I try and add more subjects to the training data, I'm the publicly avaliable EEG Motor Movement/Imagery Dataset from physio.net. The features I'm currently using for training my model are SVD entropy, Spectral Entropy, Hjorth mobility , Hjorth complexity and CSP components after applying overlapping filter bank. One method I've been suggested is to train models seperately for each subject, I don't know how to go about this should I construct a new model for each subject or should I keep the architecture but reset the training weight, also would this defeat the purpose of implementing LIME in the first place. I'm wondering if there's a way to deal with inter-subject variability without having to make/ train models for each subject seperately. I'm sorry if the choice of features doesn't make much sense since I'm basically self-taught and I have no prior knowledge about EEG or BCI TLDR; how do I deal with inter-subject variability without having to make/ train models for each subject seperately. submitted by /u/Necrozx13 [link] [comments]  ( 9 min )
    [P] Automatic hyperparameter tuning for CatBoost and LightGBM
    Hey friends, I have developed a library, called 100gecs, that makes hyperparameter tuning on LightGBM and CatBoost models trivially easy. Background LightGBM and CatBoost are gradient boosted tree models, like XGBoost, and in many cases the best baseline model in supervised learning tasks on tabular data. They work by iteratively fitting trees on data, with each subsequent tree "correcting" on some level the prediction of the prior tree. Here's a good intro, if you want some more background on these methods. Hyperparameter tuning promises an optimally or near-optimally configured model, thus enabling you to get the best baseline model you possibly can. Summary 100gecs provides custom child classes of LGBMClassifier, LGBMRegressor, CatBoostRegressor and CatBoostClassifier that can be …  ( 10 min )
    [D] Does Entry Level ML exist (in Europe)?
    In your informed opinion, how would an archetypical career in ML look like? Looking at job postings, at least in Europe, it looks like the minimum required experience is around 3 years. There seems to be a good demand for mid-senior level positions, but a void for entry level. As I understand it, most DS departments are not big enough to spare resources for newbies that must be trained, the need is for a few but seasoned engineers. How far is my guess from the truth? And, most importantly, how could new candidates (let's say recent MSc in Data Science, for instance) get into the industry? Through analyst/DE roles? submitted by /u/madway99 [link] [comments]  ( 9 min )
    [D] Any free ai text to speech programs that let me test my own voice models
    Hello everyone I've wanted to test some AI voice models I created with a free AI Text to-speech program Are there any available to test my voice models submitted by /u/mrbeanfan64 [link] [comments]  ( 9 min )
    [D] Cant get TensorFlow or PyTorch to detect my 4090 Laptop GPU
    I dont know what it could be the issue, I have tried everything from reinstalling the pip packages, running them on a virtual env, reinstalled CUDA, trying to run on PyCharm instead of VSCode, I got no idea what to do, does anyone know why it could be happening? Maybe something related to environment variables that I maybe messed up after watching like 5 different vids on the topic? What could it be? submitted by /u/someredditguy374632 [link] [comments]  ( 9 min )
    [D] CodeLlama-xb/CodeLlama-xb-Python vs. CodeLlama-xb-instruct
    Hey guys, so I have googled around and read the documentation but I am still confused between what's the difference between CodeLlama-xb/CodeLlama-xb-Python vs. CodeLlama-xb-instruct? I know the xb model is the base model (for several languages) and the Python model specializes in Python, but what's the instruct model and how is it different from the other 2 models? Would really appreciate your help. Thanks a million! submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [R] Make use of CPUs on 8 servers?
    I have a blade system with 8 blades. Each blade has 28 cores (e5-2680 v4) and 180gb ram. I would love to run an LLM + Local Files ( kind of like OpenAI does with their api) and run something similar but most “privategpt” LLMs need GPU support and don’t look like they make use of a multi device setup. Anyway to get crunching on CPU on my setup submitted by /u/programmrz [link] [comments]  ( 9 min )
    [Discussion] Seeking Guidance: Transitioning from Trucking to Tech
    Hello everyone! I'm a truck driver with a passion for creating music. While I'm familiar with basic IT tasks from my personal experiences, I'm new to the tech industry and am eager to break in. I've been exploring AI tools like ChatGPT and have become comfortable navigating GitHub. These experiences make me confident about diving into the tech field. After some research, I'm contemplating three domains: Cloud Technology: It seems stable and beginner-friendly. I'm leaning here primarily because I've heard cybersecurity is saturated. Cybersecurity: Interested but cautious due to market saturation. Generative AI/Data: My true passion lies here, but I've gathered that projects matter more than certifications for entry-level roles in this niche, especially for someone without a degree. Given my background and no technical degree, which field would you suggest I pursue? Should I focus on certifications before taking on projects? I genuinely appreciate any insights! (What do you guys think about this “CloudRoad map” is it good advice?) https://www.madebygps.com/cloudcamp/ [Discussion] submitted by /u/motluv_them [link] [comments]  ( 9 min )
  • Open

    AI apps product development canvas – Part 1
    AI Apps are domain-infused, AI/ML-powered applications that continuously learn and adapt with minimal human intervention in helping non-technical users manage data and analytics-intensive operations to deliver well-defined operational outcomes. I originally introduced the idea of a “Data Product Development Canvas” as one of the capstone deliverables (the other being the data science Hypothesis Development Canvas)… Read More »AI apps product development canvas – Part 1 The post AI apps product development canvas – Part 1 appeared first on Data Science Central.  ( 22 min )
  • Open

    What are some of your favorite ai discoveries you've found? What do you think is possible and probable to come in the near future with ai to stay tuned for?
    I've been dabbling around with pi.ai and I love it and feel like it's only going to get better and better at what it does. I'm curious if there is any interesting new ai bots or discoveries that haven't yet made it mainstream but have tons of use in one way or another for the average consumer.. Things like a language learning ai, or an ai that can read through a textbook or pdf and help you learn it's contents by practically interacting with it to help you comprehend better... there's so many interesting ai things I look forward to seeing submitted by /u/mikel0202 [link] [comments]  ( 9 min )
    Is there any AI tool to filter 5 star ratings and just find the ạverage of 1 to 4 star rạtings in Google maps to find the genuine rating of a business ?
    I am tired of fake ratings and fake reviews in Google maps and I hạve been cheated many times by fake 5 star reviews. I am just looking to find the genuine overạll rating of a place by filtering out 5 star ratings because fake ratings are mostly 5 star and just finding the average of 1-4 ratings so that we can find the overall genuine rating of a Business. Is there any AI tool or any way or any application for that ? Need suggestions on this. submitted by /u/ramesh423 [link] [comments]  ( 9 min )
    Is there anyway to mix two faces together and generate a brand new face using AI?
    I'm interested to know if it's possible to mix two faces together, such as Tom Cruise and Brad Pitt, and create a brand new face from those two faces. submitted by /u/Glad-Ad-8953 [link] [comments]  ( 9 min )
    AI generated video
    Cyberpunk montage I made using Midjourney and RunwayML along with CapCut for the transitions, etc. Here’s some brief tutorial steps. Use Midjourney to make pictures that you like. I would keep it to the default aspect ratio for best results. Use RunwayML to generate 4 second videos of the pics. Once you get some videos you like, save them and upload them to the CapCut app on your phone. submitted by /u/Exitium_Maximus [link] [comments]  ( 9 min )
    How valuable is the UK’s AI industry?
    The UK's AI industry is highly valuable, but its exact worth is difficult to determine. The industry is growing rapidly and has the potential to contribute significantly to the country's economy. However, there are challenges in accurately measuring the value of the AI industry, as it encompasses a wide range of sectors and applications. Some estimates suggest that the UK's AI industry could be worth billions of pounds, with the potential to create thousands of jobs. Investment in research and development, as well as the development of AI talent, are crucial for the growth of the industry. Source : https://www.ft.com/content/eeaa57a3-19ed-45d9-8705-2517c81e60ba submitted by /u/NuseAI [link] [comments]  ( 9 min )
    New physics-based self-learning machines could replace current artificial neural networks and save energy | "Neural networks on neuromorphic computers"
    submitted by /u/Tao_Dragon [link] [comments]  ( 9 min )
    Top 8 Courses & Certifications on AI Ethics
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
    The Accelerators Manifesto - Accelerating AI and our future
    submitted by /u/JulioMedina [link] [comments]  ( 9 min )
  • Open

    Neural Networks vs Tabular Data
    submitted by /u/keghn [link] [comments]  ( 9 min )
    LLM Training: RLHF and Its Alternatives
    submitted by /u/nickb [link] [comments]  ( 9 min )
    (Pt 3) Spatio-Temporal Perception Logic
    submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
  • Open

    Possible to find 1-day/week contract to help fund RL PhD study?
    Hi, I have been accepted onto a PhD at a top UK university and I'm looking for some additional income by working a day a week on other projects. Does anyone have experience of finding something like that to help with costs during a PhD? I have just completed my MSc with really high marks and have published a paper on conversational agents. I have strong general machine learning and data analysis knowledge, strong knowledge of conversational agents and a specialism in reinforcement learning. Before returning to study, I worked for several years in engineering teams so I know how to get stuff done in a professional context too. The only thing is, I have no idea how to actually find something that could be a day a week and pay a worthwhile day rate to help with crazy cost of living. Thank you for any tips! submitted by /u/EDMismyO2 [link] [comments]  ( 9 min )
    Policy Loss Oscillation
    Is there any insight to be gained about these policy and value loss curves? I hear it’s difficult to judge a good policy by its loss curve, but my agent is hitting a wall in terms of progression over the baseline agents I am evaluating against. For context, my policy is generated by a neural network with the core of it being a dot product of the state embedding against a set of actions embeddings. Any help/understanding would be greatly appreciated. submitted by /u/atomicburn125 [link] [comments]  ( 9 min )
    Seeking Advice on Electricity Trading Problem in Day-Ahead Market
    I'm attempting to address an issue related to electricity trading in the day-ahead market. The objective is to submit 24 bids for each hour. Each bid is represented as a vector of length 10, with 5 values indicating the price and the other 5 indicating the quantity of electricity. For instance, given a sample action vector [w, x, y, z, ...], it operates as follows: if the price is greater than w, buy/sell x units of electricity, if the price is greater than y, buy/sell z units of electricity, and so forth. I possess three years' worth of data, including crucial features like weather conditions, coal prices, wind speed, net load, forecasted load, locational marginal price, and more. Initially, I trained a Deep Q-Network (DQN) to tackle this problem, and it's performing quite well. However, the DQN provides a scalar action for each hour, neglecting price points. It acts regardless of the price. I'm aware that I can explore solutions like Proximal Policy Gradient to generate a vector action that includes both electricity unit amounts and prices. I have three questions: Is it possible to solve this problem using Dynamic Programming techniques? While I understand it's not an exceedingly difficult problem, can I expect some results if I attempt to apply DP? How challenging might it be to output a vector instead of a scalar, with the vector being monotonically increasing? What's the recommended approach for a problem of this nature? Is it worthwhile to explore RL and specifically Proximal Policy Optimization? How would you approach such a problem while keeping it simple and avoiding unnecessary complexity? Any guidance or insights would be greatly appreciated. submitted by /u/uonliaquat [link] [comments]  ( 10 min )
    Reinforcement Learning in Games - Learning the level, not the game
    I've been watching a tonne of reinforcement learning videos on youtube, and was initially very impressed, but as I watched even more, especially by the same youtubers, I started to notice a distinct issue. Their AI aren't learning to play the game, they're learning to play the level. They'll put in 10s or 100s of hours reinforcing the level. It'll play over and over again, selecting for what works, and dropping what doesn't. And over time, the AI will be amazing at that level. But, if you take that reinforced data, and move to level 2 of that game... it'll be practically useless. When humans play a game, say a brand new human who's never played video games before, we'll use reinforcement learning too. Most everything we do is reinforcement learning. Our brain works on reinforcement. …  ( 11 min )
  • Open

    Adversarial Patches and Frames in PyTorch
    Adversarial patches and frames are an alternative to the regular $L_p$-constrained adversarial examples. Often, adversarial patches are thought to be more realistic — mirroring graffitis or stickers in the real world. In this article I want to discuss a simple PyTorch implementation and present some results of adversarial patches against adversarial training as well as confidence-calibrated adversarial training. The post Adversarial Patches and Frames in PyTorch appeared first on David Stutz.  ( 7 min )
  • Open

    Making life friendlier with personal robots
    Sharifa Alghowinem, a research scientist at the Media Lab, explores personal robot technology that explains emotions in English and Arabic.  ( 8 min )
  • Open

    How you define center matters a lot
    Earlier I wrote a post showing what happens when you start with an equilateral triangle, then repeatedly subdivide it into smaller and smaller triangles by drawing lines from the centroid (barycenter) to each of the vertices. I mentioned in that post that I moved the code for finding the center to its own function because […] How you define center matters a lot first appeared on John D. Cook.  ( 5 min )

  • Open

    [Discussion] Anticipatory Customer Support using ML - Your Thoughts?
    Hey everyone, I'm Raynel, working on a CRM platform and actively considering the future integration of AI. One idea I'm particularly excited about is anticipatory customer support. The goal is to proactively address customer needs, perhaps even before they realize them, using ML-driven insights. Has anyone delved into this concept or tried implementing it? I'd love to hear thoughts, potential pitfalls, or even success stories. Thanks in advance for your insights! submitted by /u/bess_point [link] [comments]  ( 9 min )
    [D] What is good replacement for package manager
    I used to build on top of conda, or make docker containers with conda package manager, now that is so broken it is impossible to install old pytorch on a fresh environment. Any way to replace the existing requirements with something better? submitted by /u/AardvarkNo6658 [link] [comments]  ( 9 min )
    [D] Are statistics and ML too "ununified" as fields?
    I thought it would be interesting to open this discussion, I would like to hear what you guys have to say about that. I know that ML folks use statistics all the time, but I am often under the impression that it's mostly the basics. I understand that inference and prediction are inherently different, but I would assume the common knowledge will be much more vast. What is your perspective on that? Am I just missing the point? Is there room for improvement in the future? Do you think statistics literacy (advanced, not ANOVA or so) is common among ML practitioners? submitted by /u/pyepyepie [link] [comments]  ( 9 min )
    [D]what are some generative ai techniques to generate visuals synchronized with music
    I wish to generate visuals that are synced with beats etc of the music to offer the "sensory synchronization" effect where visuals closely sync with the music. I have found Lucid sonic dreams, but it appears to be quite buggy and likely no longer supported. any recommendations for tools I can leverage for a hobby->serious project of generating visuals synced with music. edit: I looked into simple approaches using fft like described here. But I was hoping there are newer generative ai techniques we could leverage. submitted by /u/bluzkluz [link] [comments]  ( 9 min )
    [D] Starting a research lab, any advice on computing infrastructure?
    I'm starting a research lab at a Top 25 research university next year and my research agenda is focused on AI/DL for a scientific domain, I have to come up with a plan and budget for my software/hardware needs. My Context: I have experience setting up linux systems, building computers by myself and training DL models (CNNs/GNNs/LLMs) in a corporate setting. I am venturing to the academic world where resources are more constrained and so I am wondering if there are any guides, tips on setting up a research computer lab that does AI. I do not expect to train from scratch an LLM but maybe finetuning an small LLM. I might also need to do a lot parallelizable IO work to preprocess data. I will talk with the IT department of the university but would like to have some ideas before that conversation. I would appreciate any tips or thoughts, particularly on: * How many GPUs/CPUs to buy? Balancing cost/compute. * Recommended software for managing resources. * Running 100s-1ks of CPU jobs in parallel. * Local compute cluster vs buying compute online. * Hyperparameter optimization and logging of metrics. * Anything else you can think of? submitted by /u/prof_is_training [link] [comments]  ( 9 min )
    [P] GoodWiki Dataset (MIT): Wikipedia Articles in Markdown With Lists, Blockquotes, and More
    Location: https://huggingface.co/datasets/euirim/goodwiki Hi everyone, just wanted to share a dataset I've been working on for use in a personal project! GoodWiki is a 179 million token dataset of English Wikipedia articles collected on September 4, 2023, that have been marked as Good or Featured by Wikipedia editors. The dataset provides these articles in GitHub-flavored Markdown format, preserving layout features like lists, code blocks, math, and block quotes, unlike many other public Wikipedia datasets. Articles are accompanied by a short description of the page as well as any associated categories. Thanks to a careful conversion process from wikicode, the markup language used by Wikipedia, articles in GoodWiki are generally faithful reproductions of the corresponding original Wikipedia pages, minus references, files, infoboxes, and tables. Curated template transclusion and HTML tag handling have minimized instances where entire words and phrases are missing mid-sentence like in other public Wikipedia datasets. GoodWiki is more than 1.5 times larger (when compared using the same tokenizer) than the widely used WikiText-103 dataset by Merity et al., even after excluding article descriptions. Also limited to articles marked as Good or Featured, WikiText inspired GoodWiki. submitted by /u/euirim [link] [comments]  ( 9 min )
    [D]Suffer from a lack of opportunities in ML?
    I wanna learn ML and i love this field there is people told me that the opportunities in this field is few (i live in egypt btw)and in USA also they suffer from a lack of opportunities in this field so is that true and should i choose another field or not because i really wanna learn ML . submitted by /u/Opening-Being-7692 [link] [comments]  ( 9 min )
    [P] Model predicting the same outcome for all entries
    Im currently working on deploying an ML model that predicts NFL MVPs (two possible outputs: 0 for not MVP and 1 for MVP). That being said, during deployment it is currently predicting 0 for all inputs regardless of how varied the inputs may be. However, during the testing phase my model had a varied accuracy rate of 75%-85%. I'm beginning to think that it is very likely that during the testing phase the model was predicting 0 for everything and just getting these higher accuracy rates because of the dominance of non-MVPs within my dataset. This all being said, I'm a noob to ML and decided it'd be best to come on here for help. Is that the likely reason and if so how do I go about fixing it? Furthermore, what other issues could cause this and how would I go about fixing it? submitted by /u/saggyboobsarecooltoo [link] [comments]  ( 9 min )
    [P] I made a website that uses LLMs to help you gain insights about your documents
    submitted by /u/spline_reticulator [link] [comments]  ( 9 min )
    LLM on Blockchain? [D]
    I recently discovered a python library called 'Petals' that should run most LLM models (LLaMA2, Stable Beluga) on some kind of "torrent" for machine learning on your device: https://github.com/bigscience-workshop/petals/, https://www.youtube.com/watch?v=8jEGVaRKmFc Furthermore, if you participate as a "server" in this "torrent" you can be rewarded (according to YouTube) Sounds too good to be true, so I didn't try it yet. Does anyone have experience with it? Can I get a virus from there? Is it not a scam? submitted by /u/Degenerat666 [link] [comments]  ( 9 min )
    [D] Unified Retrieval Augmented Generation - URAG
    PostgresML takes Retrieval Augmented Generation (RAG) a step further, by running the models and vector (or btree) indexes in the same process space that also caches the data, so the retrieval step doesn't require any networking or data (de)serialization. This makes it significantly faster and more reliable than other architectures, which is important for online or interactive applications. For lack of a better term, I'm referring to this as Unified Retrieval Augmented Generation (URAG). The closest I can find to this in the literature is TABR which utilizes recall to improve the performance of LLMs relative to tree based models, although this particular application of Retrieval is extremely computationally expensive for a slight performance improvement, unlike RAG for LLMs that seems to provide significant new capabilities to the model. Is there more research going on for what I'd refer to as URAG systems? We're building open source Python & JS SDKs to make the underlying SQL API more accessible, but I'm wondering if this community, or the academic community has already done more work in this area that we should be aware of. submitted by /u/something_cleverer [link] [comments]  ( 9 min )
    [D]: How does chatbot development look like in practise?
    With all the recent advancements in LLMs, how does chatbot development look like in practise? Suppose I want a chatbot to help with customer service. Can I then just collect some frequent / common questions about items the company might offer for sale and use a pretrained LLM to answer these questions correctly via transfer learning? Is there some effort that goes into explicit knowledge storage? submitted by /u/Blutorangensaft [link] [comments]  ( 9 min )
    [P] AI-Powered Valley Girl creating content autonomously on Twitter
    Hey everyone! I recently started a project where I make AI-powered characters that create content autonomously on social media. The first character that I launched yesterday is your typical valley girl - Alix - who hangs around tech and crypto people. She browses Twitter all day and writes tweets with her opinion of the stuff she comes by. With a unique mood generated for her every day, it's always a gamble to see who lands on her good or bad side. I would really appreciate it if you check her out and let me know what you think! I'm interested in further developing the project and making these synthetic entities equivalent to real people in entertaiment and companionship. Her twitter - https://twitter.com/alix_H2O submitted by /u/GuaranteeAny2894 [link] [comments]  ( 9 min )
    [N] NVIDIA's Groundbreaking TensorRT-LLM Can Double Inference Performance of Language Models
    submitted by /u/norcalnatv [link] [comments]  ( 9 min )
    [P] Is MDP suitable for DQN representation ?
    Problem description For ongoing research I'm defining a decision problem in the form of an MDP. In a simple form, this MDP can be solved via Dynamic Programming. Of course I would like to scale up my MDP by including a sensor simulation, which renders the MDP to such a complexity that my next step is to use a DQN. My challenge is that I'm not sure how to represent my more complex MDP in a fitting DQN. Quite frankly, the more I look into DQN's, the more I wonder if my MDP is correctly defined. It would be great to hear your views and suggestions if you like to share. Definition of the MDP The MDP aims at detecting and chasing away a mosquito. For each distance step (discrete distances although it should be continuous), for instance Si, we cycle through an episode: we detect the mosquito…  ( 10 min )
    [D] Map of the ELLIS units(European Laboratory for Learning and Intelligent Systems). What do the people here think of ELLIS? Anyone who worked with them who can share the experience? Do you think it will help close the gap between Europe and US? (sorry for the bad edit I'm not good at it)
    submitted by /u/Ok_Independent9899 [link] [comments]  ( 9 min )
    [P]Training an image classification model
    Is it normal to achieve a validation accuracy so much higher than the training accuracy? I am using transfer learning to train a convNeXtBaseV1 model on my dataset. I got a training accuracy of 82.9% and a much higher validation accuracy of 97.14%. My dataset is around 9600 medical images and it is balanced between 3 classes. I splitted it into 80% training 10% testing and 10% validation submitted by /u/Different_Hat5643 [link] [comments]  ( 9 min )
    [D] Are there any open source voice cloning models that are capable of cloning other English accents along with the voice?
    Hi. I tried tortoise tts. It is good for cloning how the voice sounds but not the original accent. Are there any open source technologies that can also copy the accent correctly? Please do suggest. Thank you! submitted by /u/salehxoxo [link] [comments]  ( 9 min )
    [P] Vector database integration with PostgreSQL
    Article | Notebook | GitHub There are a rapidly growing number of options in the vector database space. One of the more recent developments is the creation of new vector index plugins for traditional database systems such as pgvector. This is reminiscent of the discussion back in the mid 2010s on whether one should use full text search in the database or sync with an external system such as Elasticsearch. Just as with full text search, it will be tough for vector indexing in the database to compete with the more dedicated solutions past the simple use cases. For example, according the ANN-Benchmarks, the dedicated vector solutions score much higher. The desire to reduce stack complexity and the maturity of systems like Postgres make strong arguments to try to find a way to do it all in the database. The referenced article above proposes a way to integrate existing databases like PostgreSQL with vector indexes such as Faiss, Hnswlib, external vector databases and even keyword indexes like Elasticsearch. This opens up the possibility of combining Postgres features such as fine-grained access control with the performance of a dedicated vector index. ​ submitted by /u/davidmezzetti [link] [comments]  ( 9 min )
    [D] Clustering identical but time-shifted signal together from big database.
    I am working on clustering groups of almost identical (but time shifted) signals together. These clusters will have varying sizes, and I don´t know what the final number of clusters will be. The database consists of up to 100 thousand signals (represented by 4000 long vectors as read from individual text files) which are already quite similar to each other. Ideally I would just make a huge cross-correlation matrix but that is too computationally expensive. So is DTW. What I have done is to apply the fast fourier transform to get it into the frequency domain and therefore get rid of that time shift. Then I would apply a clustering algorithm. I have tried DBSCAN and hierarchical agglomerative clustering which work relatively well but don’t scale well to datasets of this size. Affinity propagation is quick and works okish but I don’t know how to optimise it. Does anyone have any recommendation on which algorithm to use and how to optimise it? Was my idea to apply the fft good? I am not a computer scientist so I am really out of my element. submitted by /u/Bertz-2- [link] [comments]  ( 9 min )
    [P] Hey all! I'm excited to launch GPTCall, a platform that enables real-time voice conversations with Llama 2 and other open-source models! It supports both desktop and mobile browsers. See comments for details.
    submitted by /u/friuns [link] [comments]  ( 9 min )
    [D] Machine Learning Problem: Predictive Maintenance for Industrial Equipment
    I was given following problem in one of the machine learning interviews. I think I messed up there. Need your approach in answering this question. Problem Description: Imagine you are working for a manufacturing company that operates a large fleet of industrial machines. These machines are critical to production, and unexpected breakdowns can result in significant downtime and financial losses. Your task is to develop a predictive maintenance model using machine learning to predict when a machine is likely to fail so that maintenance can be performed just in time to prevent a breakdown. Data: You are provided with historical data for each machine, including sensor readings, maintenance logs, and failure records. The dataset is extensive, containing millions of data points over several ye…  ( 10 min )
    [D] RVC "queue" stuck for over 2 and a half hours?
    Does anyone know what this means if when you import your audio clip and the model you want to use it gets stuck in the queue for over 2 and a half hours? I know that can't be right but I seemed to have followed all the guides correctly not sure what it could be :'( help greatly appreciated submitted by /u/StuntGuy [link] [comments]  ( 9 min )
  • Open

    Belt around an elliptical waist
    I just saw a tweet from Dave Richeson saying I remember as a kid calculating the size difference (diameter) of a belt between each hole. Now I think about it every time I wear a belt. Holes 1 inch apart change the diameter by about one-third of an inch (1/π). [Assuming people have a circular […] Belt around an elliptical waist first appeared on John D. Cook.  ( 6 min )
    Recursive triangle subdivision
    The other day I saw where Cliff Pickover tweeted some images of triangles recursively subdivided by adding a point to the barycenter of each triangle. The images were not what I expected, so I wanted to reproduce the images to look into this further. Here are the first three steps: I set the alpha value […] Recursive triangle subdivision first appeared on John D. Cook.  ( 5 min )
  • Open

    "AI Unleashed: This Week's Top 15 News and Breakthroughs in Artificial Intelligence"
    "AI Unleashed: This Week's Top 15 News and Breakthroughs in Artificial Intelligence" 1"X's Data Revolution: Your Biometrics and Career Fueling the AI of Tomorrow!" In a recent privacy policy shake-up, X has just dropped a bombshell. They're not just interested in your regular data anymore; they're diving deep into the world of biometrics, job histories, and education backgrounds. And that's not all! Another corner of their revamped policy hints at a grand plan: they want to fuse this treasure trove of data with publicly available info to supercharge their machine learning and AI models. This isn't your run-of-the-mill update; it's a quantum leap for X in their quest to build the ultimate AI system. They're not just pushing boundaries; they're smashing through them. By expanding their da…  ( 16 min )
    AI technology behind ChatGPT built was in Iowa – with a lot of water
    The artificial intelligence technology behind ChatGPT was built in Iowa, specifically in the watershed of the Raccoon and Des Moines rivers. Microsoft-backed OpenAI needed a lot of water to cool its powerful supercomputer as it taught its AI systems how to mimic human writing. Leading tech developers like Microsoft, OpenAI, and Google have acknowledged the high costs associated with the growing demand for AI tools, including expensive semiconductors and increased water consumption. Microsoft disclosed a 34% spike in global water consumption, largely attributed to its AI research. A researcher estimates that ChatGPT uses 500 milliliters of water for every series of prompts or questions. Source : https://news.yahoo.com/artificial-intelligence-technology-behind-chatgpt-131421382.html Summarized by Nuse AI submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Teaching LLMs to be more reasonable
    Based on a bit of research and a lot of gut feeling, I offer the following speculation: if you self-trained an LLM with a Python interpreter or Java compiler in a feedback loop where it learned from its own mistakes then it could become dramatically better at coding. It's actually a miracle that they are "decent" at coding despite getting virtually no feedback from an interpreter or compiler. one could train not merely on input and output, but also on an execution trace so the LLM learned HOW the interpreter got the result one could also train the model on how to install and invoke open source software and thus it would learn about a variety of languages, versions and runtimes this might also improve its logical reasoning skills in general Admittedly, running programs is a lot more expensive than doing simple next-word prediction on pre-existing texts. But on the other hand, a corpus of a million program executions can also be used to train future LLMs. You can keep the execution information forever and re-use it as traditional next-token prediction input. submitted by /u/Smallpaul [link] [comments]  ( 9 min )
    Article - "As a writer, I’m afraid of capitalism — not ChatGPT."
    submitted by /u/LaVolpe223 [link] [comments]  ( 9 min )
    100% Free and unlimited...text-to-video AI with optional image reference
    submitted by /u/the_anonymizer [link] [comments]  ( 8 min )
    I made a free tool that allows you to create a personalized AI newsletter containing all of the content you already follow. The app will automatically pull in the top or latest posts from your selected sources so you don't miss anything important.
    submitted by /u/PlayfulPhilosopher42 [link] [comments]  ( 9 min )
    LLM with a voice interface?
    So LLM's like ChatGPT have been around for a while now, and have good APIs, and also voice to text dictation and text to voice generation are close to perfect these days... are there any services that join them all together? I'd like to chat with ChatGPT or similar by talking to my smart speaker a la Google Home, and have it respond by speaking, and to be able to reply and continue the conversation. Does anyone know if this exists yet? submitted by /u/singeblanc [link] [comments]  ( 9 min )
    AI subtitles?
    hey guys, im trying to subtitle a spanish video to have english subtitles, does anyone know a good way to do it for free? submitted by /u/deletemkw [link] [comments]  ( 9 min )
    "The 7 Stages of AI" | AI Uncovered
    submitted by /u/Tao_Dragon [link] [comments]  ( 9 min )
    In AI Regulation Coverage, Media Let Lawmakers Off the Hook
    The media often portrays lawmakers as unable to regulate artificial intelligence (AI) due to its complexity and evolving nature. This narrative overlooks the responsibility of lawmakers and their regulatory inertia. The media frames AI regulation as a matter of technical knowledge rather than moral consideration. The New York Times highlights the slow congressional response to new technologies and the potential influence of lawmakers' financial interests in AI companies. The media fails to question why lawmakers, who have profited from AI, cannot apply their knowledge to regulate it. This lack of critical information in news sources defends the inaction of lawmakers. Source : https://fair.org/home/in-ai-regulation-coverage-media-let-lawmakers-off-the-hook/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    NVIDIA TensorRT-LLM Supercharges Large Language Model Inference on NVIDIA H100 GPUs
    submitted by /u/basitmakine [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/8/2023
    TIME just picked a list of 100 Most Influential People in AI.[1] AI Startup Imbue Tops $1 Billion Valuation After Funding from Nvidia.[2] Microsoft offers legal protection for AI copyright infringement challenges.[3] US chipmaker Nvidia Corp. on Friday announced separate partnerships with Reliance and Tata group companies to help them develop AI-powered supercomputers, AI clouds and generative AI applications.[4] Sources: [1] https://time.com/6311323/how-we-chose-time100-ai/ [2] https://www.bloomberg.com/news/articles/2023-09-07/ai-startup-imbue-tops-1-billion-valuation-after-funding-from-nvidia?embedded-checkout=true [3] https://arstechnica.com/information-technology/2023/09/microsoft-offers-legal-protection-for-ai-copyright-infringement-challenges/ [4] https://www.livemint.com/companies/news/reliance-tata-cos-sign-ai-partnerships-with-nvidia-11694198851600.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    RVC "queue" is stuck for over 2 and a half hours now?
    Does anyone know what this means if when you import your audio clip and the model you want to use it gets stuck in the queue for over 2 and a half hours? I know that can't be right but I seemed to have followed all the guides correctly not sure what it could be :'( help greatly appreciated submitted by /u/StuntGuy [link] [comments]  ( 9 min )
    ChatGPT hype is fading ! Traffic drops from 1.6 bn to 1.4 bn users in 3 months
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    Paige partners with Microsoft to construct world's largest AI model for battling cancer
    Healthcare technology disruptor Paige is teaming up with Microsoft in the race against cancer. The collaboration aims to revolutionize cancer diagnosis and patient care by building the largest image-based artificial intelligence model for digital pathology and oncology. To stay one step ahead in AI transformations, subscribe here. Transforming cancer imaging Using Microsoft’s advanced supercomputing infrastructure, Paige aims to take cancer imaging to the next level. By combining its deep AI expertise with Microsoft’s enormous computing power, this model augments accuracy and brings in novel capabilities in cancer diagnostics. In the next phase, Paige will incorporate up to four million digitized microscopy slides from its petabyte-scale clinical data archive. A milestone in oncology The Paige and Microsoft partnership is a game changer in advancing healthcare, equipping physicians with unprecedented insights into cancer pathology. Paige is the first company to receive FDA approval for a clinical AI application in digital pathology. The technology is set to increase diagnostic confidence, productivity, and expand treatment options for patients worldwide. (source) P.S. If you love these analyses, I write a free newsletter to track the most significant news and research in AI and tech. Experts from Google, Meta, OpenAI, and more read it daily. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    The latest Tesla self-driving car iteration is a behavior-cloning NN
    submitted by /u/gwern [link] [comments]  ( 9 min )
    Multi Agent RL Project Ideas/Implementation
    I'm looking for some ideas on Multi Agent RL that preferably involve Robotics. I've came up with two ideas based on essentially similar themes: 1) Multiple robots tasked with cleaning a large room (with obstacles) 2) Multiple robots tasked with a search and rescue like mission in a particular area. Both are basically applications of n agents trying to collectively cover a region. Can someone recommend some frameworks and libraries that can allow me to simulate these ideas? Also, I'd love to hear some other ideas as well which use multi-agent RL for robotic applications. For now I'm only targeting a simulation based project. If I get time later I'd love to implement them on hardware as well. Thanks in advance! submitted by /u/esem29 [link] [comments]  ( 9 min )
    New Textbook "Multi-Agent Reinforcement Learning: Foundations and Modern Approaches"
    New introduction textbook titled "Multi-Agent Reinforcement Learning: Foundations and Modern Approaches" by Stefano V. Albrecht, Filippos Christianos, Lukas Schäfer, to be published by MIT Press. The book draft can be downloaded here: https://www.marl-book.com/ ​ submitted by /u/vuttigiquoje-4292 [link] [comments]  ( 9 min )
    A simple analysis of why IPPO performs better than MAPPO in MARL
    To review IPPO vs. MAPPO, first, let's revisit the paper "Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?". ( https://arxiv.org/abs/2011.09533 ) The paper shows that simply applying PPO to the SMAC task to construct an IPPO (independent PPO) algorithm like IQL can surpass QMIX on SMAC. IPPO demonstrates the effectiveness of applying PPO to multi-agent systems. The paper further extends IPPO to MAPPO. The difference is that the critic of PPO uses the global state instead of the observation as input. Surprisingly, the global information does not enhance the actual performance of IPPO. ​ https://preview.redd.it/1wqqlj9z05nb1.png?width=1440&format=png&auto=webp&s=0d6ca1faa0e872151abdb6bb1e48884c6b51e71a ​ https://preview.redd.it/ei9uacl015nb1.png?width=1440&fo…  ( 10 min )
  • Open

    DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection. (arXiv:2309.03893v1 [cs.CV])
    Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diversity. To address these issues, we presentDiffusionEngine (DE), a data scaling-up engine that provides high-quality detection-oriented training pairs in a single stage. DE consists of a pre-trained diffusion model and an effective Detection-Adapter, contributing to generating scalable, diverse and generalizable detection data in a plug-and-play manner. Detection-Adapter is learned to align the implicit semantic and location knowledge in off-the-shelf diffusion models with detection-aware signals to make better bounding-box predictions. Additionally, we contribute two datasets, i.e., COCO-DE and VOC-DE, to scale up existing detection benchmarks for facilitating follow-up research. Extensive experiments demonstrate that data scaling-up via DE can achieve significant improvements in diverse scenarios, such as various detection algorithms, self-supervised pre-training, data-sparse, label-scarce, cross-domain, and semi-supervised learning. For example, when using DE with a DINO-based adapter to scale up data, mAP is improved by 3.1% on COCO, 7.6% on VOC, and 11.5% on Clipart.  ( 2 min )
    M3FGM:a node masking and multi-granularity message passing-based federated graph model for spatial-temporal data prediction. (arXiv:2210.16193v3 [cs.LG] UPDATED)
    Researchers are solving the challenges of spatial-temporal prediction by combining Federated Learning (FL) and graph models with respect to the constrain of privacy and security. In order to make better use of the power of graph model, some researchs also combine split learning(SL). However, there are still several issues left unattended: 1) Clients might not be able to access the server during inference phase; 2) The graph of clients designed manually in the server model may not reveal the proper relationship between clients. This paper proposes a new GNN-oriented split federated learning method, named node {\bfseries M}asking and {\bfseries M}ulti-granularity {\bfseries M}essage passing-based Federated Graph Model (M$^3$FGM) for the above issues. For the first issue, the server model of M$^3$FGM employs a MaskNode layer to simulate the case of clients being offline. We also redesign the decoder of the client model using a dual-sub-decoders structure so that each client model can use its local data to predict independently when offline. As for the second issue, a new GNN layer named Multi-Granularity Message Passing (MGMP) layer enables each client node to perceive global and local information. We conducted extensive experiments in two different scenarios on two real traffic datasets. Results show that M$^3$FGM outperforms the baselines and variant models, achieves the best results in both datasets and scenarios.  ( 3 min )
    Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of Protein Simulators. (arXiv:2308.15116v2 [cs.LG] UPDATED)
    Molecular dynamics simulations have emerged as a fundamental instrument for studying biomolecules. At the same time, it is desirable to perform simulations of a collection of particles under various conditions in which the molecules can fluctuate. In this paper, we explore and adapt the soft prompt-based learning method to molecular dynamics tasks. Our model can remarkably generalize to unseen and out-of-distribution scenarios with limited training data. While our work focuses on temperature as a test case, the versatility of our approach allows for efficient simulation through any continuous dynamic conditions, such as pressure and volumes. Our framework has two stages: 1) Pre-trains with data mixing technique, augments molecular structure data and temperature prompts, then applies a curriculum learning method by increasing the ratio of them smoothly. 2) Meta-learning-based fine-tuning framework improves sample-efficiency of fine-tuning process and gives the soft prompt-tuning better initialization points. Comprehensive experiments reveal that our framework excels in accuracy for in-domain data and demonstrates strong generalization capabilities for unseen and out-of-distribution samples.  ( 2 min )
    Proper Learning of Linear Dynamical Systems as a Non-Commutative Polynomial Optimisation Problem. (arXiv:2002.01444v5 [math.OC] UPDATED)
    There has been much recent progress in forecasting the next observation of a linear dynamical system (LDS), which is known as the improper learning, as well as in the estimation of its system matrices, which is known as the proper learning of LDS. We present an approach to proper learning of LDS, which in spite of the non-convexity of the problem, guarantees global convergence of numerical solutions to a least-squares estimator. We present promising computational results.  ( 2 min )
    Blink: Link Local Differential Privacy in Graph Neural Networks via Bayesian Estimation. (arXiv:2309.03190v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have gained an increasing amount of popularity due to their superior capability in learning node embeddings for various graph inference tasks, but training them can raise privacy concerns. To address this, we propose using link local differential privacy over decentralized nodes, enabling collaboration with an untrusted server to train GNNs without revealing the existence of any link. Our approach spends the privacy budget separately on links and degrees of the graph for the server to better denoise the graph topology using Bayesian estimation, alleviating the negative impact of LDP on the accuracy of the trained GNNs. We bound the mean absolute error of the inferred link probabilities against the ground truth graph topology. We then propose two variants of our LDP mechanism complementing each other in different privacy settings, one of which estimates fewer links under lower privacy budgets to avoid false positive link estimates when the uncertainty is high, while the other utilizes more information and performs better given relatively higher privacy budgets. Furthermore, we propose a hybrid variant that combines both strategies and is able to perform better across different privacy budgets. Extensive experiments show that our approach outperforms existing methods in terms of accuracy under varying privacy budgets.  ( 3 min )
    AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning. (arXiv:2308.13280v2 [physics.ao-ph] UPDATED)
    The atmosphere affects humans in a multitude of ways, from loss of life due to adverse weather effects to long-term social and economic impacts on societies. Computer simulations of atmospheric dynamics are, therefore, of great importance for the well-being of our and future generations. Here, we propose AtmoRep, a novel, task-independent stochastic computer model of atmospheric dynamics that can provide skillful results for a wide range of applications. AtmoRep uses large-scale representation learning from artificial intelligence to determine a general description of the highly complex, stochastic dynamics of the atmosphere from the best available estimate of the system's historical trajectory as constrained by observations. This is enabled by a novel self-supervised learning objective and a unique ensemble that samples from the stochastic model with a variability informed by the one in the historical record. The task-independent nature of AtmoRep enables skillful results for a diverse set of applications without specifically training for them and we demonstrate this for nowcasting, temporal interpolation, model correction, and counterfactuals. We also show that AtmoRep can be improved with additional data, for example radar observations, and that it can be extended to tasks such as downscaling. Our work establishes that large-scale neural networks can provide skillful, task-independent models of atmospheric dynamics. With this, they provide a novel means to make the large record of atmospheric observations accessible for applications and for scientific inquiry, complementing existing simulations based on first principles.  ( 3 min )
    Privacy-preserving Continual Federated Clustering via Adaptive Resonance Theory. (arXiv:2309.03487v1 [cs.LG])
    With the increasing importance of data privacy protection, various privacy-preserving machine learning methods have been proposed. In the clustering domain, various algorithms with a federated learning framework (i.e., federated clustering) have been actively studied and showed high clustering performance while preserving data privacy. However, most of the base clusterers (i.e., clustering algorithms) used in existing federated clustering algorithms need to specify the number of clusters in advance. These algorithms, therefore, are unable to deal with data whose distributions are unknown or continually changing. To tackle this problem, this paper proposes a privacy-preserving continual federated clustering algorithm. In the proposed algorithm, an adaptive resonance theory-based clustering algorithm capable of continual learning is used as a base clusterer. Therefore, the proposed algorithm inherits the ability of continual learning. Experimental results with synthetic and real-world datasets show that the proposed algorithm has superior clustering performance to state-of-the-art federated clustering algorithms while realizing data privacy protection and continual learning ability. The source code is available at \url{https://github.com/Masuyama-lab/FCAC}.  ( 2 min )
    Explanation Shift: How Did the Distribution Shift Impact the Model?. (arXiv:2303.08081v2 [cs.LG] UPDATED)
    As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In practice, new input data tend to come without target labels. Then, state-of-the-art techniques model input data distributions or model prediction distributions and try to understand issues regarding the interactions between learned models and shifting distributions. We suggest a novel approach that models how explanation characteristics shift when affected by distribution shifts. We find that the modeling of explanation shifts can be a better indicator for detecting out-of-distribution model behaviour than state-of-the-art techniques. We analyze different types of distribution shifts using synthetic examples and real-world data sets. We provide an algorithmic method that allows us to inspect the interaction between data set features and learned models and compare them to the state-of-the-art. We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.  ( 2 min )
    Revisiting Hidden Representations in Transfer Learning for Medical Imaging. (arXiv:2302.08272v2 [cs.CV] UPDATED)
    While a key component to the success of deep learning is the availability of massive amounts of training data, medical image datasets are often limited in diversity and size. Transfer learning has the potential to bridge the gap between related yet different domains. For medical applications, however, it remains unclear whether it is more beneficial to pre-train on natural or medical images. We aim to shed light on this problem by comparing initialization on ImageNet and RadImageNet on seven medical classification tasks. Our work includes a replication study, which yields results contrary to previously published findings. In our experiments, ResNet50 models pre-trained on ImageNet tend to outperform those trained on RadImageNet. To gain further insights, we investigate the learned representations using Canonical Correlation Analysis (CCA) and compare the predictions of the different models. Our results indicate that, contrary to intuition, ImageNet and RadImageNet may converge to distinct intermediate representations, which appear to diverge further during fine-tuning. Despite these distinct representations, the predictions of the models remain similar. Our findings show that the similarity between networks before and after fine-tuning does not correlate with performance gains, suggesting that the advantages of transfer learning might not solely originate from the reuse of features in the early layers of a convolutional neural network.
    A Tutorial on the Non-Asymptotic Theory of System Identification. (arXiv:2309.03873v1 [eess.SY])
    This tutorial serves as an introduction to recently developed non-asymptotic methods in the theory of -- mainly linear -- system identification. We emphasize tools we deem particularly useful for a range of problems in this domain, such as the covering technique, the Hanson-Wright Inequality and the method of self-normalized martingales. We then employ these tools to give streamlined proofs of the performance of various least-squares based estimators for identifying the parameters in autoregressive models. We conclude by sketching out how the ideas presented herein can be extended to certain nonlinear identification problems.
    Medoid Silhouette clustering with automatic cluster number selection. (arXiv:2309.03751v1 [cs.LG])
    The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, provide two fast versions for the direct optimization, and discuss the use to choose the optimal number of clusters. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm. Additionally, we provide a variant to choose the optimal number of clusters directly.
    Knowledge Distillation Layer that Lets the Student Decide. (arXiv:2309.02843v1 [cs.CV] CROSS LISTED)
    Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and beyond, its action on student's feature transform is rather implicit, limiting its practice in the intermediate layers. To explicitly embed the teacher's knowledge in feature transform, we propose a learnable KD layer for the student which improves KD with two distinct abilities: i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper. Thus, the student enjoys the teacher's knowledge during the inference besides training. Formally, we repurpose 1x1-BN-ReLU-1x1 convolution block to assign a semantic vector to each local region according to the template (supervised by the teacher) that the corresponding region of the student matches. To facilitate template learning in the intermediate layers, we propose a novel form of supervision based on the teacher's decisions. Through rigorous experimentation, we demonstrate the effectiveness of our approach on 3 popular classification benchmarks. Code is available at: https://github.com/adagorgun/letKD-framework
    On the dynamics of multi agent nonlinear filtering and learning. (arXiv:2309.03557v1 [stat.ML])
    Multiagent systems aim to accomplish highly complex learning tasks through decentralised consensus seeking dynamics and their use has garnered a great deal of attention in the signal processing and computational intelligence societies. This article examines the behaviour of multiagent networked systems with nonlinear filtering/learning dynamics. To this end, a general formulation for the actions of an agent in multiagent networked systems is presented and conditions for achieving a cohesive learning behaviour is given. Importantly, application of the so derived framework in distributed and federated learning scenarios are presented.
    Enhancing Pipeline-Based Conversational Agents with Large Language Models. (arXiv:2309.03748v1 [cs.CL])
    The latest advancements in AI and deep learning have led to a breakthrough in large language model (LLM)-based agents such as GPT-4. However, many commercial conversational agent development tools are pipeline-based and have limitations in holding a human-like conversation. This paper investigates the capabilities of LLMs to enhance pipeline-based conversational agents during two phases: 1) in the design and development phase and 2) during operations. In 1) LLMs can aid in generating training data, extracting entities and synonyms, localization, and persona design. In 2) LLMs can assist in contextualization, intent classification to prevent conversational breakdown and handle out-of-scope questions, auto-correcting utterances, rephrasing responses, formulating disambiguation questions, summarization, and enabling closed question-answering capabilities. We conducted informal experiments with GPT-4 in the private banking domain to demonstrate the scenarios above with a practical example. Companies may be hesitant to replace their pipeline-based agents with LLMs entirely due to privacy concerns and the need for deep integration within their existing ecosystems. A hybrid approach in which LLMs' are integrated into the pipeline-based agents allows them to save time and costs of building and running agents by capitalizing on the capabilities of LLMs while retaining the integration and privacy safeguards of their existing systems.
    How to select an objective function using information theory. (arXiv:2212.06566v2 [cs.LG] UPDATED)
    In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the most likely objective function or whichever represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their relative magnitudes represent how much we should prefer one objective versus another, and the log of their magnitude represents the expected uncertainty of the model.
    Equal Long-term Benefit Rate: Adapting Static Fairness Notions to Sequential Decision Making. (arXiv:2309.03426v1 [cs.LG])
    Decisions made by machine learning models may have lasting impacts over time, making long-term fairness a crucial consideration. It has been shown that when ignoring the long-term effect, naively imposing fairness criterion in static settings can actually exacerbate bias over time. To explicitly address biases in sequential decision-making, recent works formulate long-term fairness notions in Markov Decision Process (MDP) framework. They define the long-term bias to be the sum of static bias over each time step. However, we demonstrate that naively summing up the step-wise bias can cause a false sense of fairness since it fails to consider the importance difference of different time steps during transition. In this work, we introduce a long-term fairness notion called Equal Long-term Benefit Rate (ELBERT), which explicitly considers varying temporal importance and adapts static fairness principles to the sequential setting. Moreover, we show that the policy gradient of Long-term Benefit Rate can be analytically reduced to standard policy gradient. This makes standard policy optimization methods applicable for reducing the bias, leading to our proposed bias mitigation method ELBERT-PO. Experiments on three sequential decision making environments show that ELBERT-PO significantly reduces bias and maintains high utility. Code is available at https://github.com/Yuancheng-Xu/ELBERT.
    Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization. (arXiv:2309.03824v1 [cs.LG])
    Low Rank Decomposition (LRD) is a model compression technique applied to the weight tensors of deep learning models in order to reduce the number of trainable parameters and computational complexity. However, due to high number of new layers added to the architecture after applying LRD, it may not lead to a high training/inference acceleration if the decomposition ranks are not small enough. The issue is that using small ranks increases the risk of significant accuracy drop after decomposition. In this paper, we propose two techniques for accelerating low rank decomposed models without requiring to use small ranks for decomposition. These methods include rank optimization and sequential freezing of decomposed layers. We perform experiments on both convolutional and transformer-based models. Experiments show that these techniques can improve the model throughput up to 60% during training and 37% during inference when combined together while preserving the accuracy close to that of the original models
    Achieving Occam's Razor: Deep Learning for Optimal Model Reduction. (arXiv:2303.13746v2 [cs.LG] UPDATED)
    All fields of science depend on mathematical models. Occam's razor refers to the principle that good models should exclude parameters beyond those minimally required to describe the systems they represent. This is because redundancy can lead to incorrect estimates of model parameters from data, and thus inaccurate or ambiguous conclusions. Here, we show how deep learning can be powerfully leveraged to address Occam's razor. FixFit, our new method, uses a feedforward deep neural network with a bottleneck layer to characterize and predict the behavior of a given model from its input parameters. FixFit has three major benefits. First, it provides a metric to quantify the original model's degree of complexity. Second, it allows for the unique fitting of data. Third, it provides an unbiased way to discriminate between experimental hypotheses that add value versus those that do not. In two use cases, we demonstrate the broad applicability of this method across scientific domains. To validate the method using a known system, we apply FixFit to recover known composite parameters for the Kepler orbit model. To illustrate how the method can be applied to less well-established fields, we use it to identify parameters for a multi-scale brain model and reduce the search space for viable candidate mechanisms.
    Deep Learning Safety Concerns in Automated Driving Perception. (arXiv:2309.03774v1 [cs.LG])
    Recent advances in the field of deep learning and impressive performance of deep neural networks (DNNs) for perception have resulted in an increased demand for their use in automated driving (AD) systems. The safety of such systems is of utmost importance and thus requires to consider the unique properties of DNNs. In order to achieve safety of AD systems with DNN-based perception components in a systematic and comprehensive approach, so-called safety concerns have been introduced as a suitable structuring element. On the one hand, the concept of safety concerns is -- by design -- well aligned to existing standards relevant for safety of AD systems such as ISO 21448 (SOTIF). On the other hand, it has already inspired several academic publications and upcoming standards on AI safety such as ISO PAS 8800. While the concept of safety concerns has been previously introduced, this paper extends and refines it, leveraging feedback from various domain and safety experts in the field. In particular, this paper introduces an additional categorization for a better understanding as well as enabling cross-functional teams to jointly address the concerns.
    Evaluating the Efficacy of Supervised Learning vs Large Language Models for Identifying Cognitive Distortions and Suicidal Risks in Chinese Social Media. (arXiv:2309.03564v1 [cs.CL])
    Large language models, particularly those akin to the rapidly progressing GPT series, are gaining traction for their expansive influence. While there is keen interest in their applicability within medical domains such as psychology, tangible explorations on real-world data remain scant. Concurrently, users on social media platforms are increasingly vocalizing personal sentiments; under specific thematic umbrellas, these sentiments often manifest as negative emotions, sometimes escalating to suicidal inclinations. Timely discernment of such cognitive distortions and suicidal risks is crucial to effectively intervene and potentially avert dire circumstances. Our study ventured into this realm by experimenting on two pivotal tasks: suicidal risk and cognitive distortion identification on Chinese social media platforms. Using supervised learning as a baseline, we examined and contrasted the efficacy of large language models via three distinct strategies: zero-shot, few-shot, and fine-tuning. Our findings revealed a discernible performance gap between the large language models and traditional supervised learning approaches, primarily attributed to the models' inability to fully grasp subtle categories. Notably, while GPT-4 outperforms its counterparts in multiple scenarios, GPT-3.5 shows significant enhancement in suicide risk classification after fine-tuning. To our knowledge, this investigation stands as the maiden attempt at gauging large language models on Chinese social media tasks. This study underscores the forward-looking and transformative implications of using large language models in the field of psychology. It lays the groundwork for future applications in psychological research and practice.
    Punctate White Matter Lesion Segmentation in Preterm Infants Powered by Counterfactually Generative Learning. (arXiv:2309.03440v1 [eess.IV])
    Accurate segmentation of punctate white matter lesions (PWMLs) are fundamental for the timely diagnosis and treatment of related developmental disorders. Automated PWMLs segmentation from infant brain MR images is challenging, considering that the lesions are typically small and low-contrast, and the number of lesions may dramatically change across subjects. Existing learning-based methods directly apply general network architectures to this challenging task, which may fail to capture detailed positional information of PWMLs, potentially leading to severe under-segmentations. In this paper, we propose to leverage the idea of counterfactual reasoning coupled with the auxiliary task of brain tissue segmentation to learn fine-grained positional and morphological representations of PWMLs for accurate localization and segmentation. A simple and easy-to-implement deep-learning framework (i.e., DeepPWML) is accordingly designed. It combines the lesion counterfactual map with the tissue probability map to train a lightweight PWML segmentation network, demonstrating state-of-the-art performance on a real-clinical dataset of infant T1w MR images. The code is available at \href{https://github.com/ladderlab-xjtu/DeepPWML}{https://github.com/ladderlab-xjtu/DeepPWML}.
    M(otion)-mode Based Prediction of Ejection Fraction using Echocardiograms. (arXiv:2309.03759v1 [eess.IV])
    Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), where lower EF is associated with cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we propose using the M(otion)-mode of echocardiograms for estimating the EF and classifying cardiomyopathy. We generate multiple artificial M-mode images from a single echocardiogram and combine them using off-the-shelf model architectures. Additionally, we extend contrastive learning (CL) to cardiac imaging to learn meaningful representations from exploiting structures in unlabeled data allowing the model to achieve high accuracy, even with limited annotations. Our experiments show that the supervised setting converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process and being computationally much more efficient. Furthermore, CL using M-mode images is helpful for limited data scenarios, such as having labels for only 200 patients, which is common in medical applications.
    Conformal Autoregressive Generation: Beam Search with Coverage Guarantees. (arXiv:2309.03797v1 [cs.LG])
    We introduce two new extensions to the beam search algorithm based on conformal predictions (CP) to produce sets of sequences with theoretical coverage guarantees. The first method is very simple and proposes dynamically-sized subsets of beam search results but, unlike typical CP procedures, has an upper bound on the achievable guarantee depending on a post-hoc calibration measure. Our second algorithm introduces the conformal set prediction procedure as part of the decoding process, producing a variable beam width which adapts to the current uncertainty. While more complex, this procedure can achieve coverage guarantees selected a priori. We provide marginal coverage bounds for each method, and evaluate them empirically on a selection of tasks drawing from natural language processing and chemistry.
    REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation. (arXiv:2309.03322v1 [cs.LG])
    Dexterous manipulation tasks involving contact-rich interactions pose a significant challenge for both model-based control systems and imitation learning algorithms. The complexity arises from the need for multi-fingered robotic hands to dynamically establish and break contacts, balance non-prehensile forces, and control large degrees of freedom. Reinforcement learning (RL) offers a promising approach due to its general applicability and capacity to autonomously acquire optimal manipulation strategies. However, its real-world application is often hindered by the necessity to generate a large number of samples, reset the environment, and obtain reward signals. In this work, we introduce an efficient system for learning dexterous manipulation skills with RL to alleviate these challenges. The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping. This combination allows us to utilize data from different tasks or objects as a starting point for training new tasks, significantly improving learning efficiency. Additionally, our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy as well as learned reward functions, eliminating the need for manual resets and reward engineering. We demonstrate the benefits of reusing past data as replay buffer initialization for new tasks, for instance, the fast acquisition of intricate manipulation skills in the real world on a four-fingered robotic hand. (Videos: https://sites.google.com/view/reboot-dexterous)
    Natural Example-Based Explainability: a Survey. (arXiv:2309.03234v1 [cs.AI])
    Explainable Artificial Intelligence (XAI) has become increasingly significant for improving the interpretability and trustworthiness of machine learning models. While saliency maps have stolen the show for the last few years in the XAI field, their ability to reflect models' internal processes has been questioned. Although less in the spotlight, example-based XAI methods have continued to improve. It encompasses methods that use examples as explanations for a machine learning model's predictions. This aligns with the psychological mechanisms of human reasoning and makes example-based explanations natural and intuitive for users to understand. Indeed, humans learn and reason by forming mental representations of concepts based on examples. This paper provides an overview of the state-of-the-art in natural example-based XAI, describing the pros and cons of each approach. A "natural" example simply means that it is directly drawn from the training data without involving any generative process. The exclusion of methods that require generating examples is justified by the need for plausibility which is in some regards required to gain a user's trust. Consequently, this paper will explore the following family of methods: similar examples, counterfactual and semi-factual, influential instances, prototypes, and concepts. In particular, it will compare their semantic definition, their cognitive impact, and added values. We hope it will encourage and facilitate future work on natural example-based XAI.
    The Space of Adversarial Strategies. (arXiv:2209.04521v2 [cs.CR] UPDATED)
    Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade. Yet, our understanding of this phenomenon stems from a rather fragmented pool of knowledge; at present, there are a handful of attacks, each with disparate assumptions in threat models and incomparable definitions of optimality. In this paper, we propose a systematic approach to characterize worst-case (i.e., optimal) adversaries. We first introduce an extensible decomposition of attacks in adversarial machine learning by atomizing attack components into surfaces and travelers. With our decomposition, we enumerate over components to create 576 attacks (568 of which were previously unexplored). Next, we propose the Pareto Ensemble Attack (PEA): a theoretical attack that upper-bounds attack performance. With our new attacks, we measure performance relative to the PEA on: both robust and non-robust models, seven datasets, and three extended lp-based threat models incorporating compute costs, formalizing the Space of Adversarial Strategies. From our evaluation we find that attack performance to be highly contextual: the domain, model robustness, and threat model can have a profound influence on attack efficacy. Our investigation suggests that future studies measuring the security of machine learning should: (1) be contextualized to the domain & threat models, and (2) go beyond the handful of known attacks used today.
    Spatio-Temporal Contrastive Self-Supervised Learning for POI-level Crowd Flow Inference. (arXiv:2309.03239v1 [cs.LG])
    Accurate acquisition of crowd flow at Points of Interest (POIs) is pivotal for effective traffic management, public service, and urban planning. Despite this importance, due to the limitations of urban sensing techniques, the data quality from most sources is inadequate for monitoring crowd flow at each POI. This renders the inference of accurate crowd flow from low-quality data a critical and challenging task. The complexity is heightened by three key factors: 1) \emph{The scarcity and rarity of labeled data}, 2) \emph{The intricate spatio-temporal dependencies among POIs}, and 3) \emph{The myriad correlations between precise crowd flow and GPS reports}. To address these challenges, we recast the crowd flow inference problem as a self-supervised attributed graph representation learning task and introduce a novel \underline{C}ontrastive \underline{S}elf-learning framework for \underline{S}patio-\underline{T}emporal data (\model). Our approach initiates with the construction of a spatial adjacency graph founded on the POIs and their respective distances. We then employ a contrastive learning technique to exploit large volumes of unlabeled spatio-temporal data. We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances. Following the pre-training phase, the model is fine-tuned with accurate crowd flow data. Our experiments, conducted on two real-world datasets, demonstrate that the \model pre-trained on extensive noisy data consistently outperforms models trained from scratch.
    Let Quantum Neural Networks Choose Their Own Frequencies. (arXiv:2309.03279v1 [quant-ph])
    Parameterized quantum circuits as machine learning models are typically well described by their representation as a partial Fourier series of the input features, with frequencies uniquely determined by the feature map's generator Hamiltonians. Ordinarily, these data-encoding generators are chosen in advance, fixing the space of functions that can be represented. In this work we consider a generalization of quantum models to include a set of trainable parameters in the generator, leading to a trainable frequency (TF) quantum model. We numerically demonstrate how TF models can learn generators with desirable properties for solving the task at hand, including non-regularly spaced frequencies in their spectra and flexible spectral richness. Finally, we showcase the real-world effectiveness of our approach, demonstrating an improved accuracy in solving the Navier-Stokes equations using a TF model with only a single parameter added to each encoding operation. Since TF models encompass conventional fixed frequency models, they may offer a sensible default choice for variational quantum machine learning.
    A comparison of rational and neural network based approximations. (arXiv:2303.04436v2 [math.OC] UPDATED)
    Rational and neural network based approximations are efficient tools in modern approximation. These approaches are able to produce accurate approximations to nonsmooth and non-Lipschitz functions, including multivariate domain functions. In this paper we compare the efficiency of function approximation using rational approximation, neural network and their combinations. It was found that rational approximation is superior to neural network based approaches with the same number of decision variables. Our numerical experiments demonstrate the efficiency of rational approximation, even when the number of approximation parameters (that is, the dimension of the corresponding optimisation problems) is small. Another important contribution of this paper lies in the improvement of rational approximation algorithms. Namely, the optimisation based algorithms for rational approximation can be adjusted to in such a way that the conditioning number of the constraint matrices are controlled. This simple adjustment enables us to work with high dimension optimisation problems and improve the design of the neural network. The main strength of neural networks is in their ability to handle models with a large number of variables: complex models are decomposed in several simple optimisation problems. Therefore the the large number of decision variables is in the nature of neural networks.
    Fitness Approximation through Machine Learning. (arXiv:2309.03318v1 [cs.NE])
    We present a novel approach to performing fitness approximation in genetic algorithms (GAs) using machine-learning (ML) models, focusing on evolutionary agents in Gymnasium (game) simulators -- where fitness computation is costly. Maintaining a dataset of sampled individuals along with their actual fitness scores, we continually update throughout an evolutionary run a fitness-approximation ML model. We compare different methods for: 1) switching between actual and approximate fitness, 2) sampling the population, and 3) weighting the samples. Experimental findings demonstrate significant improvement in evolutionary runtimes, with fitness scores that are either identical or slightly lower than that of the fully run GA -- depending on the ratio of approximate-to-actual-fitness computation. Our approach is generic and can be easily applied to many different domains.
    Dynamic Causal Graph Convolutional Network for Traffic Prediction. (arXiv:2306.07019v2 [cs.LG] UPDATED)
    Modeling complex spatiotemporal dependencies in correlated traffic series is essential for traffic prediction. While recent works have shown improved prediction performance by using neural networks to extract spatiotemporal correlations, their effectiveness depends on the quality of the graph structures used to represent the spatial topology of the traffic network. In this work, we propose a novel approach for traffic prediction that embeds time-varying dynamic Bayesian network to capture the fine spatiotemporal topology of traffic data. We then use graph convolutional networks to generate traffic forecasts. To enable our method to efficiently model nonlinear traffic propagation patterns, we develop a deep learning-based module as a hyper-network to generate stepwise dynamic causal graphs. Our experimental results on a real traffic dataset demonstrate the superior prediction performance of the proposed method. The code is available at https://github.com/MonBG/DCGCN.
    Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?. (arXiv:2309.01108v2 [eess.AS] UPDATED)
    $ $Acoustic-to-articulatory inversion (AAI) involves mapping from the acoustic space to the articulatory space. Signal-processing features like the MFCCs, have been widely used for the AAI task. For subjects with dysarthric speech, AAI is challenging because of an imprecise and indistinct pronunciation. In this work, we perform AAI for dysarthric speech using representations from pre-trained self-supervised learning (SSL) models. We demonstrate the impact of different pre-trained features on this challenging AAI task, at low-resource conditions. In addition, we also condition x-vectors to the extracted SSL features to train a BLSTM network. In the seen case, we experiment with three AAI training schemes (subject-specific, pooled, and fine-tuned). The results, consistent across training schemes, reveal that DeCoAR, in the fine-tuned scheme, achieves a relative improvement of the Pearson Correlation Coefficient (CC) by ${\sim}$1.81\% and ${\sim}$4.56\% for healthy controls and patients, respectively, over MFCCs. In the unseen case, we observe similar average trends for different SSL features. Overall, SSL networks like wav2vec, APC, and DeCoAR, which are trained with feature reconstruction or future timestep prediction tasks, perform well in predicting dysarthric articulatory trajectories.
    Evaluating Explanation Methods for Multivariate Time Series Classification. (arXiv:2308.15223v2 [cs.LG] UPDATED)
    Multivariate time series classification is an important computational task arising in applications where data is recorded over time and over multiple channels. For example, a smartwatch can record the acceleration and orientation of a person's motion, and these signals are recorded as multivariate time series. We can classify this data to understand and predict human movement and various properties such as fitness levels. In many applications classification alone is not enough, we often need to classify but also understand what the model learns (e.g., why was a prediction given, based on what information in the data). The main focus of this paper is on analysing and evaluating explanation methods tailored to Multivariate Time Series Classification (MTSC). We focus on saliency-based explanation methods that can point out the most relevant channels and time series points for the classification decision. We analyse two popular and accurate multivariate time series classifiers, ROCKET and dResNet, as well as two popular explanation methods, SHAP and dCAM. We study these methods on 3 synthetic datasets and 2 real-world datasets and provide a quantitative and qualitative analysis of the explanations provided. We find that flattening the multivariate datasets by concatenating the channels works as well as using multivariate classifiers directly and adaptations of SHAP for MTSC work quite well. Additionally, we also find that the popular synthetic datasets we used are not suitable for time series analysis.
    Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders. (arXiv:2202.09671v4 [stat.ML] UPDATED)
    Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain. However, this approach is slow and costly because it needs many forward and reverse steps. We propose a faster and cheaper approach that adds noise not until the data become pure random noise, but until they reach a hidden noisy data distribution that we can confidently learn. Then, we use fewer reverse steps to generate data by starting from this hidden distribution that is made similar to the noisy data. We reveal that the proposed model can be cast as an adversarial auto-encoder empowered by both the diffusion process and a learnable implicit prior. Experimental results show even with a significantly smaller number of reverse diffusion steps, the proposed truncated diffusion probabilistic models can provide consistent improvements over the non-truncated ones in terms of performance in both unconditional and text-guided image generations.
    RatGPT: Turning online LLMs into Proxies for Malware Attacks. (arXiv:2308.09183v2 [cs.CR] UPDATED)
    The evolution of Generative AI and the capabilities of the newly released Large Language Models (LLMs) open new opportunities in software engineering. However, they also lead to new challenges in cybersecurity. Recently, researchers have shown the possibilities of using LLMs such as ChatGPT to generate malicious content that can directly be exploited or guide inexperienced hackers to weaponize tools and code. These studies covered scenarios that still require the attacker to be in the middle of the loop. In this study, we leverage openly available plugins and use an LLM as proxy between the attacker and the victim. We deliver a proof-of-concept where ChatGPT is used for the dissemination of malicious software while evading detection, alongside establishing the communication to a command and control (C2) server to receive commands to interact with a victim's system. Finally, we present the general approach as well as essential elements in order to stay undetected and make the attack a success. This proof-of-concept highlights significant cybersecurity issues with openly available plugins and LLMs, which require the development of security guidelines, controls, and mitigation strategies.
    Short-Term Load Forecasting Using A Particle-Swarm Optimized Multi-Head Attention-Augmented CNN-LSTM Network. (arXiv:2309.03694v1 [cs.LG])
    Short-term load forecasting is of paramount importance in the efficient operation and planning of power systems, given its inherent non-linear and dynamic nature. Recent strides in deep learning have shown promise in addressing this challenge. However, these methods often grapple with hyperparameter sensitivity, opaqueness in interpretability, and high computational overhead for real-time deployment. In this paper, I propose a novel solution that surmounts these obstacles. Our approach harnesses the power of the Particle-Swarm Optimization algorithm to autonomously explore and optimize hyperparameters, a Multi-Head Attention mechanism to discern the salient features crucial for accurate forecasting, and a streamlined framework for computational efficiency. Our method undergoes rigorous evaluation using a genuine electricity demand dataset. The results underscore its superiority in terms of accuracy, robustness, and computational efficiency. Notably, our Mean Absolute Percentage Error of 1.9376 marks a significant advancement over existing state-of-the-art approaches, heralding a new era in short-term load forecasting.
    Polynomial Bounds for Learning Noisy Optical Physical Unclonable Functions and Connections to Learning With Errors. (arXiv:2308.09199v2 [cs.LG] UPDATED)
    It is shown that a class of optical physical unclonable functions (PUFs) can be learned to arbitrary precision with arbitrarily high probability, even in the presence of noise, given access to polynomially many challenge-response pairs and polynomially bounded computational power, under mild assumptions about the distributions of the noise and challenge vectors. This extends the results of Rh\"uramir et al. (2013), who showed a subset of this class of PUFs to be learnable in polynomial time in the absence of noise, under the assumption that the optics of the PUF were either linear or had negligible nonlinear effects. We derive polynomial bounds for the required number of samples and the computational complexity of a linear regression algorithm, based on size parameters of the PUF, the distributions of the challenge and noise vectors, and the probability and accuracy of the regression algorithm, with a similar analysis to one done by Bootle et al. (2018), who demonstrated a learning attack on a poorly implemented version of the Learning With Errors problem.
    Impression-Informed Multi-Behavior Recommender System: A Hierarchical Graph Attention Approach. (arXiv:2309.03169v2 [cs.IR] UPDATED)
    While recommender systems have significantly benefited from implicit feedback, they have often missed the nuances of multi-behavior interactions between users and items. Historically, these systems either amalgamated all behaviors, such as \textit{impression} (formerly \textit{view}), \textit{add-to-cart}, and \textit{buy}, under a singular 'interaction' label, or prioritized only the target behavior, often the \textit{buy} action, discarding valuable auxiliary signals. Although recent advancements tried addressing this simplification, they primarily gravitated towards optimizing the target behavior alone, battling with data scarcity. Additionally, they tended to bypass the nuanced hierarchy intrinsic to behaviors. To bridge these gaps, we introduce the \textbf{H}ierarchical \textbf{M}ulti-behavior \textbf{G}raph Attention \textbf{N}etwork (HMGN). This pioneering framework leverages attention mechanisms to discern information from both inter and intra-behaviors while employing a multi-task Hierarchical Bayesian Personalized Ranking (HBPR) for optimization. Recognizing the need for scalability, our approach integrates a specialized multi-behavior sub-graph sampling technique. Moreover, the adaptability of HMGN allows for the seamless inclusion of knowledge metadata and time-series data. Empirical results attest to our model's prowess, registering a notable performance boost of up to 64\% in NDCG@100 metrics over conventional graph neural network methods.
    Bridging the Gap Between Target Networks and Functional Regularization. (arXiv:2106.02613v4 [stat.ML] UPDATED)
    Bootstrapping is behind much of the successes of deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages such as being inflexible and can result in instabilities, even when vanilla TD(0) converges. To overcome these issues, we propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space and we theoretically study its convergence. We conduct an experimental study across a range of environments, discount factors, and off-policiness data collections to investigate the effectiveness of the regularization induced by Target Networks and Functional Regularization in terms of performance, accuracy, and stability. Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement. Furthermore, adjusting both the regularization weight and the network update period in Functional Regularization can result in further performance improvements compared to solely adjusting the network update period as typically done with Target Networks. Our approach also enhances the ability to networks to recover accurate $Q$-values.
    Empirical Risk Minimization for Losses without Variance. (arXiv:2309.03818v1 [stat.ML])
    This paper considers an empirical risk minimization problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p \in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the optimizer by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012). Thanks to the structure of Catoni-type influence functions, we are able to establish excess risk upper bounds via using generalized generic chaining methods. Moreover, we take computational issues into consideration. We especially theoretically investigate two types of optimization methods, robust gradient descent algorithm and empirical risk-based methods. With an extensive numerical study, we find that the optimizer based on empirical risks via Catoni-style estimation indeed shows better performance than other baselines. It indicates that estimation directly based on truncated data may lead to unsatisfactory results.
    Comparing Sequential Forecasters. (arXiv:2110.00115v5 [stat.ME] UPDATED)
    Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures for estimating the time-varying difference in forecast scores. To do this, we employ confidence sequences (CS), which are sequences of confidence intervals that can be continuously monitored and are valid at arbitrary data-dependent stopping times ("anytime-valid"). The widths of our CSs are adaptive to the underlying variance of the score differences. Underlying their construction is a game-theoretic statistical framework, in which we further identify e-processes and p-processes for sequentially testing a weak null hypothesis -- whether one forecaster outperforms another on average (rather than always). Our methods do not make distributional assumptions on the forecasts or outcomes; our main theorems apply to any bounded scores, and we later provide alternative methods for unbounded scores. We empirically validate our approaches by comparing real-world baseball and weather forecasters.
    Domain Adaptation for Efficiently Fine-tuning Vision Transformer with Encrypted Images. (arXiv:2309.02556v2 [cs.CV] UPDATED)
    In recent years, deep neural networks (DNNs) trained with transformed data have been applied to various applications such as privacy-preserving learning, access control, and adversarial defenses. However, the use of transformed data decreases the performance of models. Accordingly, in this paper, we propose a novel method for fine-tuning models with transformed images under the use of the vision transformer (ViT). The proposed domain adaptation method does not cause the accuracy degradation of models, and it is carried out on the basis of the embedding structure of ViT. In experiments, we confirmed that the proposed method prevents accuracy degradation even when using encrypted images with the CIFAR-10 and CIFAR-100 datasets.
    Efficient anti-symmetrization of a neural network layer by taming the sign problem. (arXiv:2205.12250v2 [cs.LG] UPDATED)
    Explicit antisymmetrization of a neural network is a potential candidate for a universal function approximator for generic antisymmetric functions, which are ubiquitous in quantum physics. However, this procedure is a priori factorially costly to implement, making it impractical for large numbers of particles. The strategy also suffers from a sign problem. Namely, due to near-exact cancellation of positive and negative contributions, the magnitude of the antisymmetrized function may be significantly smaller than before anti-symmetrization. We show that the anti-symmetric projection of a two-layer neural network can be evaluated efficiently, opening the door to using a generic antisymmetric layer as a building block in anti-symmetric neural network Ansatzes. This approximation is effective when the sign problem is controlled, and we show that this property depends crucially the choice of activation function under standard Xavier/He initialization methods. As a consequence, using a smooth activation function requires re-scaling of the neural network weights compared to standard initializations.
    Unlearnable Examples Give a False Sense of Security: Piercing through Unexploitable Data with Learnable Examples. (arXiv:2305.09241v4 [cs.LG] UPDATED)
    Safeguarding data from unauthorized exploitation is vital for privacy and security, especially in recent rampant research in security breach such as adversarial/membership attacks. To this end, \textit{unlearnable examples} (UEs) have been recently proposed as a compelling protection, by adding imperceptible perturbation to data so that models trained on them cannot classify them accurately on original clean distribution. Unfortunately, we find UEs provide a false sense of security, because they cannot stop unauthorized users from utilizing other unprotected data to remove the protection, by turning unlearnable data into learnable again. Motivated by this observation, we formally define a new threat by introducing \textit{learnable unauthorized examples} (LEs) which are UEs with their protection removed. The core of this approach is a novel purification process that projects UEs onto the manifold of LEs. This is realized by a new joint-conditional diffusion model which denoises UEs conditioned on the pixel and perceptual similarity between UEs and LEs. Extensive experiments demonstrate that LE delivers state-of-the-art countering performance against both supervised UEs and unsupervised UEs in various scenarios, which is the first generalizable countermeasure to UEs across supervised learning and unsupervised learning. Our code is available at \url{https://github.com/jiangw-0/LE_JCDP}.
    Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework. (arXiv:2309.02428v2 [cs.LG] UPDATED)
    The burgeoning growth of public domain data and the increasing complexity of deep learning model architectures have underscored the need for more efficient data representation and analysis techniques. This paper is motivated by the work of Helal (2023) and aims to present a comprehensive overview of tensorization. This transformative approach bridges the gap between the inherently multidimensional nature of data and the simplified 2-dimensional matrices commonly used in linear algebra-based machine learning algorithms. This paper explores the steps involved in tensorization, multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches. A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python. Results indicate that multiway analysis is more expressive. Contrary to the intuition of the dimensionality curse, utilising multidimensional datasets in their native form and applying multiway analysis methods grounded in multilinear algebra reveal a profound capacity to capture intricate interrelationships among various dimensions while, surprisingly, reducing the number of model parameters and accelerating processing. A survey of the multi-away analysis methods and integration with various Deep Neural Networks models is presented using case studies in different domains.
    ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation. (arXiv:2309.03891v1 [cs.RO])
    We present ArtiGrasp, a novel method to synthesize bi-manual hand-object interactions that include grasping and articulation. This task is challenging due to the diversity of the global wrist motions and the precise finger control that are necessary to articulate objects. ArtiGrasp leverages reinforcement learning and physics simulations to train a policy that controls the global and local hand pose. Our framework unifies grasping and articulation within a single policy guided by a single hand pose reference. Moreover, to facilitate the training of the precise finger control required for articulation, we present a learning curriculum with increasing difficulty. It starts with single-hand manipulation of stationary objects and continues with multi-agent training including both hands and non-stationary objects. To evaluate our method, we introduce Dynamic Object Grasping and Articulation, a task that involves bringing an object into a target articulated pose. This task requires grasping, relocation, and articulation. We show our method's efficacy towards this task. We further demonstrate that our method can generate motions with noisy hand-object pose estimates from an off-the-shelf image-based regressor.
    USE-Evaluator: Performance Metrics for Medical Image Segmentation Models with Uncertain, Small or Empty Reference Annotations. (arXiv:2209.13008v4 [eess.IV] UPDATED)
    Performance metrics for medical image segmentation models are used to measure the agreement between the reference annotation and the predicted segmentation. Usually, overlap metrics, such as the Dice, are used as a metric to evaluate the performance of these models in order for results to be comparable. However, there is a mismatch between the distributions of cases and difficulty level of segmentation tasks in public data sets compared to clinical practice. Common metrics fail to measure the impact of this mismatch, especially for clinical data sets that include low signal pathologies, a difficult segmentation task, and uncertain, small, or empty reference annotations. This limitation may result in ineffective research of machine learning practitioners in designing and optimizing models. Dimensions of evaluating clinical value include consideration of the uncertainty of reference annotations, independence from reference annotation volume size, and evaluation of classification of empty reference annotations. We study how uncertain, small, and empty reference annotations influence the value of metrics for medical image segmentation on an in-house data set regardless of the model. We examine metrics behavior on the predictions of a standard deep learning framework in order to identify metrics with clinical value. We compare to a public benchmark data set (BraTS 2019) with a high-signal pathology and certain, larger, and no empty reference annotations. We may show machine learning practitioners, how uncertain, small, or empty reference annotations require a rethinking of the evaluation and optimizing procedures. The evaluation code was released to encourage further analysis of this topic. https://github.com/SophieOstmeier/UncertainSmallEmpty.git
    Learning from Demonstration via Probabilistic Diagrammatic Teaching. (arXiv:2309.03835v1 [cs.RO])
    Learning for Demonstration (LfD) enables robots to acquire new skills by imitating expert demonstrations, allowing users to communicate their instructions in an intuitive manner. Recent progress in LfD often relies on kinesthetic teaching or teleoperation as the medium for users to specify the demonstrations. Kinesthetic teaching requires physical handling of the robot, while teleoperation demands proficiency with additional hardware. This paper introduces an alternative paradigm for LfD called Diagrammatic Teaching. Diagrammatic Teaching aims to teach robots novel skills by prompting the user to sketch out demonstration trajectories on 2D images of the scene, these are then synthesised as a generative model of motion trajectories in 3D task space. Additionally, we present the Ray-tracing Probabilistic Trajectory Learning (RPTL) framework for Diagrammatic Teaching. RPTL extracts time-varying probability densities from the 2D sketches, applies ray-tracing to find corresponding regions in 3D Cartesian space, and fits a probabilistic model of motion trajectories to these regions. New motion trajectories, which mimic those sketched by the user, can then be generated from the probabilistic model. We empirically validate our framework both in simulation and on real robots, which include a fixed-base manipulator and a quadruped-mounted manipulator.
    Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation. (arXiv:2309.02685v2 [cs.RO] UPDATED)
    Recent studies have verified that equivariant methods can significantly improve the data efficiency, generalizability, and robustness in robot learning. Meanwhile, denoising diffusion-based generative modeling has recently gained significant attention as a promising approach for robotic manipulation learning from demonstrations with stochastic behaviors. In this paper, we present Diffusion-EDFs, a novel approach that incorporates spatial roto-translation equivariance, i.e., SE(3)-equivariance to diffusion generative modeling. By integrating SE(3)-equivariance into our model architectures, we demonstrate that our proposed method exhibits remarkable data efficiency, requiring only 5 to 10 task demonstrations for effective end-to-end training. Furthermore, our approach showcases superior generalizability compared to previous diffusion-based manipulation methods.
    Sparse Federated Training of Object Detection in the Internet of Vehicles. (arXiv:2309.03569v1 [cs.LG])
    As an essential component part of the Intelligent Transportation System (ITS), the Internet of Vehicles (IoV) plays a vital role in alleviating traffic issues. Object detection is one of the key technologies in the IoV, which has been widely used to provide traffic management services by analyzing timely and sensitive vehicle-related information. However, the current object detection methods are mostly based on centralized deep training, that is, the sensitive data obtained by edge devices need to be uploaded to the server, which raises privacy concerns. To mitigate such privacy leakage, we first propose a federated learning-based framework, where well-trained local models are shared in the central server. However, since edge devices usually have limited computing power, plus a strict requirement of low latency in IoVs, we further propose a sparse training process on edge devices, which can effectively lighten the model, and ensure its training efficiency on edge devices, thereby reducing communication overheads. In addition, due to the diverse computing capabilities and dynamic environment, different sparsity rates are applied to edge devices. To further guarantee the performance, we propose, FedWeg, an improved aggregation scheme based on FedAvg, which is designed by the inverse ratio of sparsity rates. Experiments on the real-life dataset using YOLO show that the proposed scheme can achieve the required object detection rate while saving considerable communication costs.
    OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs. (arXiv:2309.03876v1 [cs.CL])
    Instruction-tuned Large Language Models (LLMs) have recently showcased remarkable ability to generate fitting responses to natural language instructions. However, an open research question concerns the inherent biases of trained models and their responses. For instance, if the data used to tune an LLM is dominantly written by persons with a specific political bias, we might expect generated answers to share this bias. Current research work seeks to de-bias such models, or suppress potentially biased answers. With this demonstration, we take a different view on biases in instruction-tuning: Rather than aiming to suppress them, we aim to make them explicit and transparent. To this end, we present OpinionGPT, a web demo in which users can ask questions and select all biases they wish to investigate. The demo will answer this question using a model fine-tuned on text representing each of the selected biases, allowing side-by-side comparison. To train the underlying model, we identified 11 different biases (political, geographic, gender, age) and derived an instruction-tuning corpus in which each answer was written by members of one of these demographics. This paper presents OpinionGPT, illustrates how we trained the bias-aware model and showcases the web application (available at https://opiniongpt.informatik.hu-berlin.de).
    ImageBind-LLM: Multi-modality Instruction Tuning. (arXiv:2309.03905v1 [cs.MM])
    We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder. Then, the image features transformed by the bind network are added to word tokens of all layers in LLaMA, which progressively injects visual instructions via an attention-free and zero-initialized gating mechanism. Aided by the joint embedding of ImageBind, the simple image-text training enables our model to exhibit superior multi-modality instruction-following capabilities. During inference, the multi-modality inputs are fed into the corresponding ImageBind encoders, and processed by a proposed visual cache model for further cross-modal embedding enhancement. The training-free cache model retrieves from three million image features extracted by ImageBind, which effectively mitigates the training-inference modality discrepancy. Notably, with our approach, ImageBind-LLM can respond to instructions of diverse modalities and demonstrate significant language generation quality. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.
    Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints. (arXiv:2304.06104v2 [cs.LG] UPDATED)
    This paper studies the problem of online performance optimization of constrained closed-loop control systems, where both the objective and the constraints are unknown black-box functions affected by exogenous time-varying contextual disturbances. A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution under certain regularity conditions. Furthermore, the algorithm achieves zero time-average constraint violation, ensuring that the average value of the constraint function satisfies the desired constraint. The method is applied to both sampled instances from Gaussian processes and a continuous stirred tank reactor parameter tuning problem; simulation results show that the method simultaneously provides close-to-optimal performance and maintains constraint feasibility on average. This contrasts current state-of-the-art methods, which either suffer from large cumulative regret or severe constraint violations for the case studies presented.
    Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity. (arXiv:2111.06781v3 [cs.LG] UPDATED)
    Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes (MDPs) (also called controlled Markov chains) and various efforts have been made in the literature towards the applicability of such algorithms for continuous state and action spaces. In this paper, we show that under very mild regularity conditions (in particular, involving only weak continuity of the transition kernel of an MDP), Q-learning for standard Borel MDPs via quantization of states and actions (called Quantized Q-Learning) converges to a limit, and furthermore this limit satisfies an optimality equation which leads to near optimality with either explicit performance bounds or which are guaranteed to be asymptotically optimal. Our approach builds on (i) viewing quantization as a measurement kernel and thus a quantized MDP as a partially observed Markov decision process (POMDP), (ii) utilizing near optimality and convergence results of Q-learning for POMDPs, and (iii) finally, near-optimality of finite state model approximations for MDPs with weakly continuous kernels which we show to correspond to the fixed point of the constructed POMDP. Thus, our paper presents a very general convergence and approximation result for the applicability of Q-learning for continuous MDPs.
    GPT Can Solve Mathematical Problems Without a Calculator. (arXiv:2309.03241v1 [cs.LG])
    Previous studies have typically assumed that large language models are unable to accurately perform arithmetic operations, particularly multiplication of >8 digits, and operations involving decimals and fractions, without the use of calculator tools. This paper aims to challenge this misconception. With sufficient training data, a 2 billion-parameter language model can accurately perform multi-digit arithmetic operations with almost 100% accuracy without data leakage, significantly surpassing GPT-4 (whose multi-digit multiplication accuracy is only 4.3%). We also demonstrate that our MathGLM, fine-tuned from GLM-10B on a dataset with additional multi-step arithmetic operations and math problems described in text, achieves similar performance to GPT-4 on a 5,000-samples Chinese math problem test set.
    Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning. (arXiv:2309.03581v1 [cs.LG])
    Hyperparameter optimization (HPO) is important to leverage the full potential of machine learning (ML). In practice, users are often interested in multi-objective (MO) problems, i.e., optimizing potentially conflicting objectives, like accuracy and energy consumption. To tackle this, the vast majority of MO-ML algorithms return a Pareto front of non-dominated machine learning models to the user. Optimizing the hyperparameters of such algorithms is non-trivial as evaluating a hyperparameter configuration entails evaluating the quality of the resulting Pareto front. In literature, there are known indicators that assess the quality of a Pareto front (e.g., hypervolume, R2) by quantifying different properties (e.g., volume, proximity to a reference point). However, choosing the indicator that leads to the desired Pareto front might be a hard task for a user. In this paper, we propose a human-centered interactive HPO approach tailored towards multi-objective ML leveraging preference learning to extract desiderata from users that guide the optimization. Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator. Concretely, we leverage pairwise comparisons of distinct Pareto fronts to learn such an appropriate quality indicator. Then, we optimize the hyperparameters of the underlying MO-ML algorithm towards this learned indicator using a state-of-the-art HPO approach. In an experimental study targeting the environmental impact of ML, we demonstrate that our approach leads to substantially better Pareto fronts compared to optimizing based on a wrong indicator pre-selected by the user, and performs comparable in the case of an advanced user knowing which indicator to pick.
    Characterizing Lipschitz Stability of GNN for Fairness. (arXiv:2309.03648v1 [cs.LG])
    The Lipschitz bound, a technique from robust statistics, can limit the maximum changes in the output concerning the input, taking into account associated irrelevant biased factors. It is an efficient and provable method for examining the output stability of machine learning models without incurring additional computation costs. Recently, Graph Neural Networks (GNNs), which operate on non-Euclidean data, have gained significant attention. However, no previous research has investigated the GNN Lipschitz bounds to shed light on stabilizing model outputs, especially when working on non-Euclidean data with inherent biases. Given the inherent biases in common graph data used for GNN training, it poses a serious challenge to constraining the GNN output perturbations induced by input biases, thereby safeguarding fairness during training. Recently, despite the Lipschitz constant's use in controlling the stability of Euclideanneural networks, the calculation of the precise Lipschitz constant remains elusive for non-Euclidean neural networks like GNNs, especially within fairness contexts. To narrow this gap, we begin with the general GNNs operating on an attributed graph, and formulate a Lipschitz bound to limit the changes in the output regarding biases associated with the input. Additionally, we theoretically analyze how the Lipschitz constant of a GNN model could constrain the output perturbations induced by biases learned from data for fairness training. We experimentally validate the Lipschitz bound's effectiveness in limiting biases of the model output. Finally, from a training dynamics perspective, we demonstrate why the theoretical Lipschitz bound can effectively guide the GNN training to better trade-off between accuracy and fairness.
    Community-Based Hierarchical Positive-Unlabeled (PU) Model Fusion for Chronic Disease Prediction. (arXiv:2309.03386v1 [cs.LG])
    Positive-Unlabeled (PU) Learning is a challenge presented by binary classification problems where there is an abundance of unlabeled data along with a small number of positive data instances, which can be used to address chronic disease screening problem. State-of-the-art PU learning methods have resulted in the development of various risk estimators, yet they neglect the differences among distinct populations. To address this issue, we present a novel Positive-Unlabeled Learning Tree (PUtree) algorithm. PUtree is designed to take into account communities such as different age or income brackets, in tasks of chronic disease prediction. We propose a novel approach for binary decision-making, which hierarchically builds community-based PU models and then aggregates their deliverables. Our method can explicate each PU model on the tree for the optimized non-leaf PU node splitting. Furthermore, a mask-recovery data augmentation strategy enables sufficient training of the model in individual communities. Additionally, the proposed approach includes an adversarial PU risk estimator to capture hierarchical PU-relationships, and a model fusion network that integrates data from each tree path, resulting in robust binary classification results. We demonstrate the superior performance of PUtree as well as its variants on two benchmarks and a new diabetes-prediction dataset.
    Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction. (arXiv:2309.03619v1 [cs.SD])
    The choice of the objective function is crucial in emerging high-quality representations from self-supervised learning. This paper investigates how different formulations of the Barlow Twins (BT) objective impact downstream task performance for speech data. We propose Modified Barlow Twins (MBT) with normalized latents to enforce scale-invariance and evaluate on speaker identification, gender recognition and keyword spotting tasks. Our results show MBT improves representation generalization over original BT, especially when fine-tuning with limited target data. This highlights the importance of designing objectives that encourage invariant and transferable representations. Our analysis provides insights into how the BT learning objective can be tailored to produce speech representations that excel when adapted to new downstream tasks. This study is an important step towards developing reusable self-supervised speech representations.
    A Natural Gas Consumption Forecasting System for Continual Learning Scenarios based on Hoeffding Trees with Change Point Detection Mechanism. (arXiv:2309.03720v1 [cs.LG])
    Forecasting natural gas consumption, considering seasonality and trends, is crucial in planning its supply and consumption and optimizing the cost of obtaining it, mainly by industrial entities. However, in times of threats to its supply, it is also a critical element that guarantees the supply of this raw material to meet individual consumers' needs, ensuring society's energy security. This article introduces a novel multistep ahead forecasting of natural gas consumption with change point detection integration for model collection selection with continual learning capabilities using data stream processing. The performance of the forecasting models based on the proposed approach is evaluated in a complex real-world use case of natural gas consumption forecasting. We employed Hoeffding tree predictors as forecasting models and the Pruned Exact Linear Time (PELT) algorithm for the change point detection procedure. The change point detection integration enables selecting a different model collection for successive time frames. Thus, three model collection selection procedures (with and without an error feedback loop) are defined and evaluated for forecasting scenarios with various densities of detected change points. These models were compared with change point agnostic baseline approaches. Our experiments show that fewer change points result in a lower forecasting error regardless of the model collection selection procedure employed. Also, simpler model collection selection procedures omitting forecasting error feedback leads to more robust forecasting models suitable for continual learning tasks.
    Large Language Models as Optimizers. (arXiv:2309.03409v1 [cs.LG])
    Optimization is ubiquitous. While derivative-based algorithms have been powerful tools for various problems, the absence of gradient imposes challenges on many real-world applications. In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values, then the new solutions are evaluated and added to the prompt for the next optimization step. We first showcase OPRO on linear regression and traveling salesman problems, then move on to prompt optimization where the goal is to find instructions that maximize the task accuracy. With a variety of LLMs, we demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.
    Non-inferiority of Deep Learning Acute Ischemic Stroke Segmentation on Non-Contrast CT Compared to Expert Neuroradiologists. (arXiv:2211.15341v3 [eess.IV] UPDATED)
    To determine if a convolutional neural network (CNN) deep learning model can accurately segment acute ischemic changes on non-contrast CT compared to neuroradiologists. Non-contrast CT (NCCT) examinations from 232 acute ischemic stroke patients who were enrolled in the DEFUSE 3 trial were included in this study. Three experienced neuroradiologists independently segmented hypodensity that reflected the ischemic core on each scan. The neuroradiologist with the most experience (expert A) served as the ground truth for deep learning model training. Two additional neuroradiologists (experts B and C) segmentations were used for data testing. The 232 studies were randomly split into training and test sets. The training set was further randomly divided into 5 folds with training and validation sets. A 3-dimensional CNN architecture was trained and optimized to predict the segmentations of expert A from NCCT. The performance of the model was assessed using a set of volume, overlap, and distance metrics using non-inferiority thresholds of 20%, 3ml, and 3mm. The optimized model trained on expert A was compared to test experts B and C. We used a one-sided Wilcoxon signed-rank test to test for the non-inferiority of the model-expert compared to the inter-expert agreement. The final model performance for the ischemic core segmentation task reached a performance of 0.46+-0.09 Surface Dice at Tolerance 5mm and 0.47+-0.13 Dice when trained on expert A. Compared to the two test neuroradiologists the model-expert agreement was non-inferior to the inter-expert agreement, p < 0.05. The CNN accurately delineates the hypodense ischemic core on NCCT in acute ischemic stroke patients with an accuracy comparable to neuroradiologists.
    Cross-domain Sound Recognition for Efficient Underwater Data Analysis. (arXiv:2309.03451v1 [cs.SD])
    This paper presents a novel deep learning approach for analyzing massive underwater acoustic data by leveraging a model trained on a broad spectrum of non-underwater (aerial) sounds. Recognizing the challenge in labeling vast amounts of underwater data, we propose a two-fold methodology to accelerate this labor-intensive procedure. The first part of our approach involves PCA and UMAP visualization of the underwater data using the feature vectors of an aerial sound recognition model. This enables us to cluster the data in a two dimensional space and listen to points within these clusters to understand their defining characteristics. This innovative method simplifies the process of selecting candidate labels for further training. In the second part, we train a neural network model using both the selected underwater data and the non-underwater dataset. We conducted a quantitative analysis to measure the precision, recall, and F1 score of our model for recognizing airgun sounds, a common type of underwater sound. The F1 score achieved by our model exceeded 84.3%, demonstrating the effectiveness of our approach in analyzing underwater acoustic data. The methodology presented in this paper holds significant potential to reduce the amount of labor required in underwater data analysis and opens up new possibilities for further research in the field of cross-domain data analysis.
    Chat Failures and Troubles: Reasons and Solutions. (arXiv:2309.03708v1 [cs.RO])
    This paper examines some common problems in Human-Robot Interaction (HRI) causing failures and troubles in Chat. A given use case's design decisions start with the suitable robot, the suitable chatting model, identifying common problems that cause failures, identifying potential solutions, and planning continuous improvement. In conclusion, it is recommended to use a closed-loop control algorithm that guides the use of trained Artificial Intelligence (AI) pre-trained models and provides vocabulary filtering, re-train batched models on new datasets, learn online from data streams, and/or use reinforcement learning models to self-update the trained models and reduce errors.
    Reduced Simulations for High-Energy Physics, a Middle Ground for Data-Driven Physics Research. (arXiv:2309.03780v1 [hep-ex])
    Subatomic particle track reconstruction (tracking) is a vital task in High-Energy Physics experiments. Tracking is exceptionally computationally challenging and fielded solutions, relying on traditional algorithms, do not scale linearly. Machine Learning (ML) assisted solutions are a promising answer. We argue that a complexity-reduced problem description and the data representing it, will facilitate the solution exploration workflow. We provide the REDuced VIrtual Detector (REDVID) as a complexity-reduced detector model and particle collision event simulator combo. REDVID is intended as a simulation-in-the-loop, to both generate synthetic data efficiently and to simplify the challenge of ML model design. The fully parametric nature of our tool, with regards to system-level configuration, while in contrast to physics-accurate simulations, allows for the generation of simplified data for research and education, at different levels. Resulting from the reduced complexity, we showcase the computational efficiency of REDVID by providing the computational cost figures for a multitude of simulation benchmarks. As a simulation and a generative tool for ML-assisted solution design, REDVID is highly flexible, reusable and open-source. Reference data sets generated with REDVID are publicly available.
    Improved theoretical guarantee for rank aggregation via spectral method. (arXiv:2309.03808v1 [stat.ML])
    Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. As it is generally NP-hard to find a global ranking that minimizes the mismatch (known as the Kemeny optimization), we focus on the Erd\"os-R\'enyi outliers (ERO) model for this ranking problem. Here, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart. By using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item, with only $\Omega(n\log n)$ samples. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.
    Uncovering Drift in Textual Data: An Unsupervised Method for Detecting and Mitigating Drift in Machine Learning Models. (arXiv:2309.03831v1 [cs.CL])
    Drift in machine learning refers to the phenomenon where the statistical properties of data or context, in which the model operates, change over time leading to a decrease in its performance. Therefore, maintaining a constant monitoring process for machine learning model performance is crucial in order to proactively prevent any potential performance regression. However, supervised drift detection methods require human annotation and consequently lead to a longer time to detect and mitigate the drift. In our proposed unsupervised drift detection method, we follow a two step process. Our first step involves encoding a sample of production data as the target distribution, and the model training data as the reference distribution. In the second step, we employ a kernel-based statistical test that utilizes the maximum mean discrepancy (MMD) distance metric to compare the reference and target distributions and estimate any potential drift. Our method also identifies the subset of production data that is the root cause of the drift. The models retrained using these identified high drift samples show improved performance on online customer experience quality metrics.
    Cross-Image Context Matters for Bongard Problems. (arXiv:2309.03468v1 [cs.CV])
    Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, existing methods have only reached 66% accuracy (where chance is 50%). Low accuracy is often attributed to neural nets' lack of ability to find human-like symbolic rules. In this work, we point out that many existing methods are forfeiting accuracy due to a much simpler problem: they do not incorporate information contained in the support set as a whole, and rely instead on information extracted from individual supports. This is a critical issue, because unlike in few-shot learning tasks concerning object classification, the "key concept" in a typical Bongard problem can only be distinguished using multiple positives and multiple negatives. We explore a variety of simple methods to take this cross-image context into account, and demonstrate substantial gains over prior methods, leading to new state-of-the-art performance on Bongard-LOGO (75.3%) and Bongard-HOI (72.45%) and strong performance on the original Bongard problem set (60.84%).
    A Causal Perspective on Loan Pricing: Investigating the Impacts of Selection Bias on Identifying Bid-Response Functions. (arXiv:2309.03730v1 [cs.LG])
    In lending, where prices are specific to both customers and products, having a well-functioning personalized pricing policy in place is essential to effective business making. Typically, such a policy must be derived from observational data, which introduces several challenges. While the problem of ``endogeneity'' is prominently studied in the established pricing literature, the problem of selection bias (or, more precisely, bid selection bias) is not. We take a step towards understanding the effects of selection bias by posing pricing as a problem of causal inference. Specifically, we consider the reaction of a customer to price a treatment effect. In our experiments, we simulate varying levels of selection bias on a semi-synthetic dataset on mortgage loan applications in Belgium. We investigate the potential of parametric and nonparametric methods for the identification of individual bid-response functions. Our results illustrate how conventional methods such as logistic regression and neural networks suffer adversely from selection bias. In contrast, we implement state-of-the-art methods from causal machine learning and show their capability to overcome selection bias in pricing data.
    Broadband Ground Motion Synthesis via Generative Adversarial Neural Operators: Development and Validation. (arXiv:2309.03447v1 [physics.geo-ph])
    We present a data-driven model for ground-motion synthesis using a Generative Adversarial Neural Operator (GANO) that combines recent advancements in machine learning and open access strong motion data sets to generate three-component acceleration time histories conditioned on moment magnitude ($M$), rupture distance ($R_{rup}$), time-average shear-wave velocity at the top $30m$ ($V_{S30}$), and tectonic environment or style of faulting. We use Neural Operators, a resolution invariant architecture that guarantees that the model training is independent of the data sampling frequency. We first present the conditional ground-motion synthesis algorithm (referred to heretofore as cGM-GANO) and discuss its advantages compared to previous work. Next, we verify the cGM-GANO framework using simulated ground motions generated with the Southern California Earthquake Center (SCEC) Broadband Platform (BBP). We lastly train cGM-GANO on a KiK-net dataset from Japan, showing that the framework can recover the magnitude, distance, and $V_{S30}$ scaling of Fourier amplitude and pseudo-spectral accelerations. We evaluate cGM-GANO through residual analysis with the empirical dataset as well as by comparison with conventional Ground Motion Models (GMMs) for selected ground motion scenarios. Results show that cGM-GANO produces consistent median scaling with the GMMs for the corresponding tectonic environments. The largest misfit is observed at short distances due to the scarcity of training data. With the exception of short distances, the aleatory variability of the response spectral ordinates is also well captured, especially for subduction events due to the adequacy of training data. Applications of the presented framework include generation of risk-targeted ground motions for site-specific engineering applications.
    Filtration Surfaces for Dynamic Graph Classification. (arXiv:2309.03616v1 [cs.LG])
    Existing approaches for classifying dynamic graphs either lift graph kernels to the temporal domain, or use graph neural networks (GNNs). However, current baselines have scalability issues, cannot handle a changing node set, or do not take edge weight information into account. We propose filtration surfaces, a novel method that is scalable and flexible, to alleviate said restrictions. We experimentally validate the efficacy of our model and show that filtration surfaces outperform previous state-of-the-art baselines on datasets that rely on edge weight information. Our method does so while being either completely parameter-free or having at most one parameter, and yielding the lowest overall standard deviation.
    Testing properties of distributions in the streaming model. (arXiv:2309.03245v1 [cs.DS])
    We study distribution testing in the standard access model and the conditional access model when the memory available to the testing algorithm is bounded. In both scenarios, the samples appear in an online fashion and the goal is to test the properties of distribution using an optimal number of samples subject to a memory constraint on how many samples can be stored at a given time. First, we provide a trade-off between the sample complexity and the space complexity for testing identity when the samples are drawn according to the conditional access oracle. We then show that we can learn a succinct representation of a monotone distribution efficiently with a memory constraint on the number of samples that are stored that is almost optimal. We also show that the algorithm for monotone distributions can be extended to a larger class of decomposable distributions.
    A Robust Negative Learning Approach to Partial Domain Adaptation Using Source Prototypes. (arXiv:2309.03531v1 [cs.CV])
    This work proposes a robust Partial Domain Adaptation (PDA) framework that mitigates the negative transfer problem by incorporating a robust target-supervision strategy. It leverages ensemble learning and includes diverse, complementary label feedback, alleviating the effect of incorrect feedback and promoting pseudo-label refinement. Rather than relying exclusively on first-order moments for distribution alignment, our approach offers explicit objectives to optimize intra-class compactness and inter-class separation with the inferred source prototypes and highly-confident target samples in a domain-invariant fashion. Notably, we ensure source data privacy by eliminating the need to access the source data during the adaptation phase through a priori inference of source prototypes. We conducted a series of comprehensive experiments, including an ablation analysis, covering a range of partial domain adaptation tasks. Comprehensive evaluations on benchmark datasets corroborate our framework's enhanced robustness and generalization, demonstrating its superiority over existing state-of-the-art PDA approaches.
    Knowledge Graphs in Practice: Characterizing their Users, Challenges, and Visualization Opportunities. (arXiv:2304.01311v3 [cs.HC] UPDATED)
    This study presents insights from interviews with nineteen Knowledge Graph (KG) practitioners who work in both enterprise and academic settings on a wide variety of use cases. Through this study, we identify critical challenges experienced by KG practitioners when creating, exploring, and analyzing KGs that could be alleviated through visualization design. Our findings reveal three major personas among KG practitioners - KG Builders, Analysts, and Consumers - each of whom have their own distinct expertise and needs. We discover that KG Builders would benefit from schema enforcers, while KG Analysts need customizable query builders that provide interim query results. For KG Consumers, we identify a lack of efficacy for node-link diagrams, and the need for tailored domain-specific visualizations to promote KG adoption and comprehension. Lastly, we find that implementing KGs effectively in practice requires both technical and social solutions that are not addressed with current tools, technologies, and collaborative workflows. From the analysis of our interviews, we distill several visualization research directions to improve KG usability, including knowledge cards that balance digestibility and discoverability, timeline views to track temporal changes, interfaces that support organic discovery, and semantic explanations for AI and machine learning predictions.
    LDMRes-Net: Enabling Efficient Medical Image Segmentation on IoT and Edge Platforms. (arXiv:2306.06145v2 [eess.IV] UPDATED)
    In this study, we propose LDMRes-Net, a lightweight dual-multiscale residual block-based computational neural network tailored for medical image segmentation on IoT and edge platforms. Conventional U-Net-based models face challenges in meeting the speed and efficiency demands of real-time clinical applications, such as disease monitoring, radiation therapy, and image-guided surgery. LDMRes-Net overcomes these limitations with its remarkably low number of learnable parameters (0.072M), making it highly suitable for resource-constrained devices. The model's key innovation lies in its dual multi-residual block architecture, which enables the extraction of refined features on multiple scales, enhancing overall segmentation performance. To further optimize efficiency, the number of filters is carefully selected to prevent overlap, reduce training time, and improve computational efficiency. The study includes comprehensive evaluations, focusing on segmentation of the retinal image of vessels and hard exudates crucial for the diagnosis and treatment of ophthalmology. The results demonstrate the robustness, generalizability, and high segmentation accuracy of LDMRes-Net, positioning it as an efficient tool for accurate and rapid medical image segmentation in diverse clinical applications, particularly on IoT and edge platforms. Such advances hold significant promise for improving healthcare outcomes and enabling real-time medical image analysis in resource-limited settings.
    Efficient Single Object Detection on Image Patches with Early Exit Enhanced High-Precision CNNs. (arXiv:2309.03530v1 [cs.CV])
    This paper proposes a novel approach for detecting objects using mobile robots in the context of the RoboCup Standard Platform League, with a primary focus on detecting the ball. The challenge lies in detecting a dynamic object in varying lighting conditions and blurred images caused by fast movements. To address this challenge, the paper presents a convolutional neural network architecture designed specifically for computationally constrained robotic platforms. The proposed CNN is trained to achieve high precision classification of single objects in image patches and to determine their precise spatial positions. The paper further integrates Early Exits into the existing high-precision CNN architecture to reduce the computational cost of easily rejectable cases in the background class. The training process involves a composite loss function based on confidence and positional losses with dynamic weighting and data augmentation. The proposed approach achieves a precision of 100% on the validation dataset and a recall of almost 87%, while maintaining an execution time of around 170 $\mu$s per hypotheses. By combining the proposed approach with an Early Exit, a runtime optimization of more than 28%, on average, can be achieved compared to the original CNN. Overall, this paper provides an efficient solution for an enhanced detection of objects, especially the ball, in computationally constrained robotic platforms.
    Source Camera Identification and Detection in Digital Videos through Blind Forensics. (arXiv:2309.03353v1 [cs.CV])
    Source camera identification in digital videos is the problem of associating an unknown digital video with its source device, within a closed set of possible devices. The existing techniques in source detection of digital videos try to find a fingerprint of the actual source in the video in form of PRNU (Photo Response Non--Uniformity), and match it against the SPN (Sensor Pattern Noise) of each possible device. The highest correlation indicates the correct source. We investigate the problem of identifying a video source through a feature based approach using machine learning. In this paper, we present a blind forensic technique of video source authentication and identification, based on feature extraction, feature selection and subsequent source classification. The main aim is to determine whether a claimed source for a video is actually its original source. If not, we identify its original source. Our experimental results prove the efficiency of the proposed method compared to traditional fingerprint based technique.
    Byzantine-Robust Federated Learning with Variance Reduction and Differential Privacy. (arXiv:2309.03437v1 [cs.LG])
    Federated learning (FL) is designed to preserve data privacy during model training, where the data remains on the client side (i.e., IoT devices), and only model updates of clients are shared iteratively for collaborative learning. However, this process is vulnerable to privacy attacks and Byzantine attacks: the local model updates shared throughout the FL network will leak private information about the local training data, and they can also be maliciously crafted by Byzantine attackers to disturb the learning. In this paper, we propose a new FL scheme that guarantees rigorous privacy and simultaneously enhances system robustness against Byzantine attacks. Our approach introduces sparsification- and momentum-driven variance reduction into the client-level differential privacy (DP) mechanism, to defend against Byzantine attackers. The security design does not violate the privacy guarantee of the client-level DP mechanism; hence, our approach achieves the same client-level DP guarantee as the state-of-the-art. We conduct extensive experiments on both IID and non-IID datasets and different tasks and evaluate the performance of our approach against different Byzantine attacks by comparing it with state-of-the-art defense methods. The results of our experiments show the efficacy of our framework and demonstrate its ability to improve system robustness against Byzantine attacks while achieving a strong privacy guarantee.
    Automated Bioinformatics Analysis via AutoBA. (arXiv:2309.03242v1 [q-bio.GN])
    With the fast-growing and evolving omics data, the demand for streamlined and adaptable tools to handle the analysis continues to grow. In response to this need, we introduce Auto Bioinformatics Analysis (AutoBA), an autonomous AI agent based on a large language model designed explicitly for conventional omics data analysis. AutoBA simplifies the analytical process by requiring minimal user input while delivering detailed step-by-step plans for various bioinformatics tasks. Through rigorous validation by expert bioinformaticians, AutoBA's robustness and adaptability are affirmed across a diverse range of omics analysis cases, including whole genome sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, and spatial transcriptomics. AutoBA's unique capacity to self-design analysis processes based on input data variations further underscores its versatility. Compared with online bioinformatic services, AutoBA deploys the analysis locally, preserving data privacy. Moreover, different from the predefined pipeline, AutoBA has adaptability in sync with emerging bioinformatics tools. Overall, AutoBA represents a convenient tool, offering robustness and adaptability for complex omics data analysis.
    Which algorithm to select in sports timetabling?. (arXiv:2309.03229v1 [cs.AI])
    Any sports competition needs a timetable, specifying when and where teams meet each other. The recent International Timetabling Competition (ITC2021) on sports timetabling showed that, although it is possible to develop general algorithms, the performance of each algorithm varies considerably over the problem instances. This paper provides an instance space analysis for sports timetabling, resulting in powerful insights into the strengths and weaknesses of eight state-of-the-art algorithms. Based on machine learning techniques, we propose an algorithm selection system that predicts which algorithm is likely to perform best when given the characteristics of a sports timetabling problem instance. Furthermore, we identify which characteristics are important in making that prediction, providing insights in the performance of the algorithms, and suggestions to further improve them. Finally, we assess the empirical hardness of the instances. Our results are based on large computational experiments involving about 50 years of CPU time on more than 500 newly generated problem instances.
    Quantum-AI empowered Intelligent Surveillance: Advancing Public Safety Through Innovative Contraband Detection. (arXiv:2309.03231v1 [quant-ph])
    Surveillance systems have emerged as crucial elements in upholding peace and security in the modern world. Their ubiquity aids in monitoring suspicious activities effectively. However, in densely populated environments, continuous active monitoring becomes impractical, necessitating the development of intelligent surveillance systems. AI integration in the surveillance domain was a big revolution, however, speed issues have prevented its widespread implementation in the field. It has been observed that quantum artificial intelligence has led to a great breakthrough. Quantum artificial intelligence-based surveillance systems have shown to be more accurate as well as capable of performing well in real-time scenarios, which had never been seen before. In this research, a RentinaNet model is integrated with Quantum CNN and termed as Quantum-RetinaNet. By harnessing the Quantum capabilities of QCNN, Quantum-RetinaNet strikes a balance between accuracy and speed. This innovative integration positions it as a game-changer, addressing the challenges of active monitoring in densely populated scenarios. As demand for efficient surveillance solutions continues to grow, Quantum-RetinaNet offers a compelling alternative to existing CNN models, upholding accuracy standards without sacrificing real-time performance. The unique attributes of Quantum-RetinaNet have far-reaching implications for the future of intelligent surveillance. With its enhanced processing speed, it is poised to revolutionize the field, catering to the pressing need for rapid yet precise monitoring. As Quantum-RetinaNet becomes the new standard, it ensures public safety and security while pushing the boundaries of AI in surveillance.
    Relay Diffusion: Unifying diffusion process across resolutions for image synthesis. (arXiv:2309.03350v1 [cs.CV])
    Diffusion models achieved great success in image synthesis, but still face challenges in high-resolution generation. Through the lens of discrete cosine transformation, we find the main reason is that \emph{the same noise level on a higher resolution results in a higher Signal-to-Noise Ratio in the frequency domain}. In this work, we present Relay Diffusion Model (RDM), which transfers a low-resolution image or noise into an equivalent high-resolution one for diffusion model via blurring diffusion and block noise. Therefore, the diffusion process can continue seamlessly in any new resolution or model without restarting from pure noise or low-resolution conditioning. RDM achieves state-of-the-art FID on CelebA-HQ and sFID on ImageNet 256$\times$256, surpassing previous works such as ADM, LDM and DiT by a large margin. All the codes and checkpoints are open-sourced at \url{https://github.com/THUDM/RelayDiffusion}.
    Personalized Tucker Decomposition: Modeling Commonality and Peculiarity on Tensor Data. (arXiv:2309.03439v1 [cs.LG])
    We propose personalized Tucker decomposition (perTucker) to address the limitations of traditional tensor decomposition methods in capturing heterogeneity across different datasets. perTucker decomposes tensor data into shared global components and personalized local components. We introduce a mode orthogonality assumption and develop a proximal gradient regularized block coordinate descent algorithm that is guaranteed to converge to a stationary point. By learning unique and common representations across datasets, we demonstrate perTucker's effectiveness in anomaly detection, client classification, and clustering through a simulation study and two case studies on solar flare detection and tonnage signal classification.
    ViewMix: Augmentation for Robust Representation in Self-Supervised Learning. (arXiv:2309.03360v1 [cs.CV])
    Joint Embedding Architecture-based self-supervised learning methods have attributed the composition of data augmentations as a crucial factor for their strong representation learning capabilities. While regional dropout strategies have proven to guide models to focus on lesser indicative parts of the objects in supervised methods, it hasn't been adopted by self-supervised methods for generating positive pairs. This is because the regional dropout methods are not suitable for the input sampling process of the self-supervised methodology. Whereas dropping informative pixels from the positive pairs can result in inefficient training, replacing patches of a specific object with a different one can steer the model from maximizing the agreement between different positive pairs. Moreover, joint embedding representation learning methods have not made robustness their primary training outcome. To this end, we propose the ViewMix augmentation policy, specially designed for self-supervised learning, upon generating different views of the same image, patches are cut and pasted from one view to another. By leveraging the different views created by this augmentation strategy, multiple joint embedding-based self-supervised methodologies obtained better localization capability and consistently outperformed their corresponding baseline methods. It is also demonstrated that incorporating ViewMix augmentation policy promotes robustness of the representations in the state-of-the-art methods. Furthermore, our experimentation and analysis of compute times suggest that ViewMix augmentation doesn't introduce any additional overhead compared to other counterparts.
    Towards Comparable Knowledge Distillation in Semantic Image Segmentation. (arXiv:2309.03659v1 [cs.CV])
    Knowledge Distillation (KD) is one proposed solution to large model sizes and slow inference speed in semantic segmentation. In our research we identify 25 proposed distillation loss terms from 14 publications in the last 4 years. Unfortunately, a comparison of terms based on published results is often impossible, because of differences in training configurations. A good illustration of this problem is the comparison of two publications from 2022. Using the same models and dataset, Structural and Statistical Texture Distillation (SSTKD) reports an increase of student mIoU of 4.54 and a final performance of 29.19, while Adaptive Perspective Distillation (APD) only improves student performance by 2.06 percentage points, but achieves a final performance of 39.25. The reason for such extreme differences is often a suboptimal choice of hyperparameters and a resulting underperformance of the student model used as reference point. In our work, we reveal problems of insufficient hyperparameter tuning by showing that distillation improvements of two widely accepted frameworks, SKD and IFVD, vanish when hyperparameters are optimized sufficiently. To improve comparability of future research in the field, we establish a solid baseline for three datasets and two student models and provide extensive information on hyperparameter tuning. We find that only two out of eight techniques can compete with our simple baseline on the ADE20K dataset.
    Multi-Modality Guidance Network For Missing Modality Inference. (arXiv:2309.03452v1 [cs.CV])
    Multimodal models have gained significant success in recent years. Standard multimodal approaches often assume unchanged modalities from training stage to inference stage. In practice, however, many scenarios fail to satisfy such assumptions with missing modalities during inference, leading to limitations on where multimodal models can be applied. While existing methods mitigate the problem through reconstructing the missing modalities, it increases unnecessary computational cost, which could be just as critical, especially for large, deployed systems. To solve the problem from both sides, we propose a novel guidance network that promotes knowledge sharing during training, taking advantage of the multimodal representations to train better single-modality models for inference. Real-life experiment in violence detection shows that our proposed framework trains single-modality models that significantly outperform its traditionally trained counterparts while maintaining the same inference cost.
    Robotic Table Tennis: A Case Study into a High Speed Learning System. (arXiv:2309.03315v1 [cs.RO])
    We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be found at https://youtu.be/uFcnWjB42I0.
    No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function. (arXiv:2309.03224v1 [cs.AI])
    Large language models (LLMs) exhibit impressive language understanding and in-context learning abilities including natural language processing (NLP) tasks and challenging mathematical reasoning. However, due to the lack of process-supervision, applying PLMs to mathematical reasoning tasks often fail to generate correct reasoning steps and final answer even though solutions have high probabilities. To unleash the mathematical reasoning of finetuned-LLMs without any further fineutuning steps, we propose a method to endow LLMs with immediate reaction and delicate reasoning system via Monte Carlo Tree Search(MCTS) and a light energy function to rank the decision steps. In particular, We first re-formalize the finetuned-LLMs to a Residual-based Energy Model~(Residual-EBM) and apply noise contrastive estimation to estimate the parameters of energy function . Then we use MCTS with energy function as path verifier to search the output space and evaluating the reasoning path. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our method that improve the pass@1 of the finetuned-model without further finetuning or RLHF alignment by a substantial margin.
    Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to Beat. (arXiv:2309.03237v1 [cs.LG])
    We carefully evaluate a number of algorithms for learning in a federated environment, and test their utility for a variety of image classification tasks. We consider many issues that have not been adequately considered before: whether learning over data sets that do not have diverse sets of images affects the results; whether to use a pre-trained feature extraction "backbone"; how to evaluate learner performance (we argue that classification accuracy is not enough), among others. Overall, across a wide variety of settings, we find that vertically decomposing a neural network seems to give the best results, and outperforms more standard reconciliation-used methods.
    Using Neural Networks for Fast SAR Roughness Estimation of High Resolution Images. (arXiv:2309.03351v1 [cs.CV])
    The analysis of Synthetic Aperture Radar (SAR) imagery is an important step in remote sensing applications, and it is a challenging problem due to its inherent speckle noise. One typical solution is to model the data using the $G_I^0$ distribution and extract its roughness information, which in turn can be used in posterior imaging tasks, such as segmentation, classification and interpretation. This leads to the need of quick and reliable estimation of the roughness parameter from SAR data, especially with high resolution images. Unfortunately, traditional parameter estimation procedures are slow and prone to estimation failures. In this work, we proposed a neural network-based estimation framework that first learns how to predict underlying parameters of $G_I^0$ samples and then can be used to estimate the roughness of unseen data. We show that this approach leads to an estimator that is quicker, yields less estimation error and is less prone to failures than the traditional estimation procedures for this problem, even when we use a simple network. More importantly, we show that this same methodology can be generalized to handle image inputs and, even if trained on purely synthetic data for a few seconds, is able to perform real time pixel-wise roughness estimation for high resolution real SAR imagery.
    A Probabilistic Semi-Supervised Approach with Triplet Markov Chains. (arXiv:2309.03707v1 [stat.ML])
    Triplet Markov chains are general generative models for sequential data which take into account three kinds of random variables: (noisy) observations, their associated discrete labels and latent variables which aim at strengthening the distribution of the observations and their associated labels. However, in practice, we do not have at our disposal all the labels associated to the observations to estimate the parameters of such models. In this paper, we propose a general framework based on a variational Bayesian inference to train parameterized triplet Markov chain models in a semi-supervised context. The generality of our approach enables us to derive semi-supervised algorithms for a variety of generative models for sequential Bayesian classification.
    TSGBench: Time Series Generation Benchmark. (arXiv:2309.03755v1 [cs.LG])
    Synthetic Time Series Generation (TSG) is crucial in a range of applications, including data augmentation, anomaly detection, and privacy preservation. Although significant strides have been made in this field, existing methods exhibit three key limitations: (1) They often benchmark against similar model types, constraining a holistic view of performance capabilities. (2) The use of specialized synthetic and private datasets introduces biases and hampers generalizability. (3) Ambiguous evaluation measures, often tied to custom networks or downstream tasks, hinder consistent and fair comparison. To overcome these limitations, we introduce \textsf{TSGBench}, the inaugural TSG Benchmark, designed for a unified and comprehensive assessment of TSG methods. It comprises three modules: (1) a curated collection of publicly available, real-world datasets tailored for TSG, together with a standardized preprocessing pipeline; (2) a comprehensive evaluation measures suite including vanilla measures, new distance-based assessments, and visualization tools; (3) a pioneering generalization test rooted in Domain Adaptation (DA), compatible with all methods. We have conducted extensive experiments across ten real-world datasets from diverse domains, utilizing ten advanced TSG methods and twelve evaluation measures, all gauged through \textsf{TSGBench}. The results highlight its remarkable efficacy and consistency. More importantly, \textsf{TSGBench} delivers a statistical breakdown of method rankings, illuminating performance variations across different datasets and measures, and offering nuanced insights into the effectiveness of each method.
    EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation. (arXiv:2309.03244v1 [eess.IV])
    We introduce EGIC, a novel generative image compression method that allows traversing the distortion-perception curve efficiently from a single model. Specifically, we propose an implicitly encoded variant of image interpolation that predicts the residual between a MSE-optimized and GAN-optimized decoder output. On the receiver side, the user can then control the impact of the residual on the GAN-based reconstruction. Together with improved GAN-based building blocks, EGIC outperforms a wide-variety of perception-oriented and distortion-oriented baselines, including HiFiC, MRIC and DIRAC, while performing almost on par with VTM-20.0 on the distortion end. EGIC is simple to implement, very lightweight (e.g. 0.18x model parameters compared to HiFiC) and provides excellent interpolation characteristics, which makes it a promising candidate for practical applications targeting the low bit range.
    BoXHED2.0: Scalable boosting of dynamic survival analysis. (arXiv:2103.12591v5 [cs.LG] UPDATED)
    Modern applications of survival analysis increasingly involve time-dependent covariates. The Python package BoXHED2.0 is a tree-boosted hazard estimator that is fully nonparametric, and is applicable to survival settings far more general than right-censoring, including recurring events and competing risks. BoXHED2.0 is also scalable to the point of being on the same order of speed as parametric boosted survival models, in part because its core is written in C++ and it also supports the use of GPUs and multicore CPUs. BoXHED2.0 is available from PyPI and also from www.github.com/BoXHED.
    Domain Generalization for Mammographic Image Analysis with Contrastive Learning. (arXiv:2304.10226v5 [cs.CV] UPDATED)
    The deep learning technique has been shown to be effectively addressed several image analysis tasks in the computer-aided diagnosis scheme for mammography. The training of an efficacious deep learning model requires large data with diverse styles and qualities. The diversity of data often comes from the use of various scanners of vendors. But, in practice, it is impractical to collect a sufficient amount of diverse data for training. To this end, a novel contrastive learning is developed to equip the deep learning models with better style generalization capability. Specifically, the multi-style and multi-view unsupervised self-learning scheme is carried out to seek robust feature embedding against style diversity as a pretrained model. Afterward, the pretrained network is further fine-tuned to the downstream tasks, e.g., mass detection, matching, BI-RADS rating, and breast density classification. The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets. The experimental results suggest that the proposed domain generalization method can effectively improve performance of four mammographic image tasks on the data from both seen and unseen domains, and outperform many state-of-the-art (SOTA) generalization methods.
    Adversarially Robust Deep Learning with Optimal-Transport-Regularized Divergences. (arXiv:2309.03791v1 [cs.LG])
    We introduce the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. These methods are based on a new class of optimal-transport-regularized divergences, constructed via an infimal convolution between an information divergence and an optimal-transport (OT) cost. We use these as tools to enhance adversarial robustness by maximizing the expected loss over a neighborhood of distributions, a technique known as distributionally robust optimization. Viewed as a tool for constructing adversarial samples, our method allows samples to be both transported, according to the OT cost, and re-weighted, according to the information divergence. We demonstrate the effectiveness of our method on malware detection and image recognition applications and find that, to our knowledge, it outperforms existing methods at enhancing the robustness against adversarial attacks. $ARMOR_D$ yields the robustified accuracy of $98.29\%$ against $FGSM$ and $98.18\%$ against $PGD^{40}$ on the MNIST dataset, reducing the error rate by more than $19.7\%$ and $37.2\%$ respectively compared to prior methods. Similarly, in malware detection, a discrete (binary) data domain, $ARMOR_D$ improves the robustified accuracy under $rFGSM^{50}$ attack compared to the previous best-performing adversarial training methods by $37.0\%$ while lowering false negative and false positive rates by $51.1\%$ and $57.53\%$, respectively.
    Towards Personalized Federated Learning via Heterogeneous Model Reassembly. (arXiv:2308.08643v2 [cs.LG] UPDATED)
    This paper focuses on addressing the practical yet challenging problem of model heterogeneity in federated learning, where clients possess models with different network structures. To track this problem, we propose a novel framework called pFedHR, which leverages heterogeneous model reassembly to achieve personalized federated learning. In particular, we approach the problem of heterogeneous model personalization as a model-matching optimization task on the server side. Moreover, pFedHR automatically and dynamically generates informative and diverse personalized candidates with minimal human intervention. Furthermore, our proposed heterogeneous model reassembly technique mitigates the adverse impact introduced by using public data with different distributions from the client data to a certain extent. Experimental results demonstrate that pFedHR outperforms baselines on three datasets under both IID and Non-IID settings. Additionally, pFedHR effectively reduces the adverse impact of using different public data and dynamically generates diverse personalized models in an automated manner.
    Learning continuous-valued treatment effects through representation balancing. (arXiv:2309.03731v1 [cs.LG])
    Estimating the effects of treatments with an associated dose on an instance's outcome, the "dose response", is relevant in a variety of domains, from healthcare to business, economics, and beyond. Such effects, also known as continuous-valued treatment effects, are typically estimated from observational data, which may be subject to dose selection bias. This means that the allocation of doses depends on pre-treatment covariates. Previous studies have shown that conventional machine learning approaches fail to learn accurate individual estimates of dose responses under the presence of dose selection bias. In this work, we propose CBRNet, a causal machine learning approach to estimate an individual dose response from observational data. CBRNet adopts the Neyman-Rubin potential outcome framework and extends the concept of balanced representation learning for overcoming selection bias to continuous-valued treatments. Our work is the first to apply representation balancing in a continuous-valued treatment setting. We evaluate our method on a newly proposed benchmark. Our experiments demonstrate CBRNet's ability to accurately learn treatment effects under selection bias and competitive performance with respect to other state-of-the-art methods.
    Graph Fairing Convolutional Networks for Anomaly Detection. (arXiv:2010.10274v2 [cs.LG] UPDATED)
    Graph convolution is a fundamental building block for many deep neural networks on graph-structured data. In this paper, we introduce a simple, yet very effective graph convolutional network with skip connections for semi-supervised anomaly detection. The proposed layerwise propagation rule of our model is theoretically motivated by the concept of implicit fairing in geometry processing, and comprises a graph convolution module for aggregating information from immediate node neighbors and a skip connection module for combining layer-wise neighborhood representations. This propagation rule is derived from the iterative solution of the implicit fairing equation via the Jacobi method. In addition to capturing information from distant graph nodes through skip connections between the network's layers, our approach exploits both the graph structure and node features for learning discriminative node representations. These skip connections are integrated by design in our proposed network architecture. The effectiveness of our model is demonstrated through extensive experiments on five benchmark datasets, achieving better or comparable anomaly detection results against strong baseline methods. We also demonstrate through an ablation study that skip connection helps improve the model performance.
    Continual Pre-Training of Large Language Models: How to (re)warm your model?. (arXiv:2308.04014v2 [cs.CL] UPDATED)
    Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch$\unicode{x2013}$even for a large downstream dataset.
    Trinary Decision Trees for missing value handling. (arXiv:2309.03561v1 [stat.ML])
    This paper introduces the Trinary decision tree, an algorithm designed to improve the handling of missing data in decision tree regressors and classifiers. Unlike other approaches, the Trinary decision tree does not assume that missing values contain any information about the response. Both theoretical calculations on estimator bias and numerical illustrations using real data sets are presented to compare its performance with established algorithms in different missing data scenarios (Missing Completely at Random (MCAR), and Informative Missingness (IM)). Notably, the Trinary tree outperforms its peers in MCAR settings, especially when data is only missing out-of-sample, while lacking behind in IM settings. A hybrid model, the TrinaryMIA tree, which combines the Trinary tree and the Missing In Attributes (MIA) approach, shows robust performance in all types of missingness. Despite the potential drawback of slower training speed, the Trinary tree offers a promising and more accurate method of handling missing data in decision tree algorithms.
    Generating quantum feature maps using multi-objective genetic algorithm. (arXiv:2309.03307v1 [quant-ph])
    We present a novel approach for efficiently generating quantum feature maps for quantum-enhanced support vector machines, a kernel-based classifier, enabling access to high-dimensional Hilbert space. Our method employs a multi-objective genetic algorithm that simultaneously maximizes classification accuracy while minimizing both the local and non-local gate costs of the quantum feature map's circuit. To achieve this, we define distinct fitness functions for local gates and entanglement gates. Comparisons with classical classifiers are given in order to understand the advantages of using quantum machine learning. Surprisingly, our experiments reveal that the optimal configuration of quantum circuits for the quantum kernel method incorporates a proportional number of non-local gates for entanglement, contrary to previous literature where non-local gates were largely suppressed. Furthermore, we demonstrate that the separability indexes of data can be effectively leveraged to determine the number of non-local gates required for the quantum support vector machine's feature maps. This insight can significantly aid in selecting appropriate parameters, such as the entanglement parameter, in various quantum programming packages like quiskit.org based on data analysis. Our findings offer valuable guidance for enhancing the efficiency and accuracy of quantum machine learning algorithms.
    RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph Classification. (arXiv:2308.02335v2 [cs.LG] UPDATED)
    Graph classification is a crucial task in many real-world multimedia applications, where graphs can represent various multimedia data types such as images, videos, and social networks. Previous efforts have applied graph neural networks (GNNs) in balanced situations where the class distribution is balanced. However, real-world data typically exhibit long-tailed class distributions, resulting in a bias towards the head classes when using GNNs and limited generalization ability over the tail classes. Recent approaches mainly focus on re-balancing different classes during model training, which fails to explicitly introduce new knowledge and sacrifices the performance of the head classes. To address these drawbacks, we propose a novel framework called Retrieval Augmented Hybrid Network (RAHNet) to jointly learn a robust feature extractor and an unbiased classifier in a decoupled manner. In the feature extractor training stage, we develop a graph retrieval module to search for relevant graphs that directly enrich the intra-class diversity for the tail classes. Moreover, we innovatively optimize a category-centered supervised contrastive loss to obtain discriminative representations, which is more suitable for long-tailed scenarios. In the classifier fine-tuning stage, we balance the classifier weights with two weight regularization techniques, i.e., Max-norm and weight decay. Experiments on various popular benchmarks verify the superiority of the proposed method against state-of-the-art approaches.
    Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models. (arXiv:2309.02976v2 [cs.RO] UPDATED)
    Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning~(RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments. Videos: https://sites.google.com/view/naturalwalkingrl
    ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate simulators. (arXiv:2306.08754v3 [cs.LG] UPDATED)
    Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state. The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res, https://huggingface.co/datasets/LEAP/ClimSim_low-res, and https://huggingface.co/datasets/LEAP/ClimSim_low-res_aqua-planet) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society.
    Max-Margin Token Selection in Attention Mechanism. (arXiv:2306.13596v3 [cs.LG] UPDATED)
    Attention mechanism is a central component of the transformer architecture which led to the phenomenal success of large language models. However, the theoretical principles underlying the attention mechanism are poorly understood, especially its nonconvex optimization dynamics. In this work, we explore the seminal softmax-attention model $f(\boldsymbol{X})=\langle \boldsymbol{Xv}, \texttt{softmax}(\boldsymbol{XWp})\rangle$, where $\boldsymbol{X}$ is the token sequence and $(\boldsymbol{v},\boldsymbol{W},\boldsymbol{p})$ are trainable parameters. We prove that running gradient descent on $\boldsymbol{p}$, or equivalently $\boldsymbol{W}$, converges in direction to a max-margin solution that separates $\textit{locally-optimal}$ tokens from non-optimal ones. This clearly formalizes attention as an optimal token selection mechanism. Remarkably, our results are applicable to general data and precisely characterize $\textit{optimality}$ of tokens in terms of the value embeddings $\boldsymbol{Xv}$ and problem geometry. We also provide a broader regularization path analysis that establishes the margin maximizing nature of attention even for nonlinear prediction heads. When optimizing $\boldsymbol{v}$ and $\boldsymbol{p}$ simultaneously with logistic loss, we identify conditions under which the regularization paths directionally converge to their respective hard-margin SVM solutions where $\boldsymbol{v}$ separates the input features based on their labels. Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$. Finally, we verify our theoretical findings via numerical experiments and provide insights.
    Gradient-Based Feature Learning under Structured Data. (arXiv:2309.03843v1 [stat.ML])
    Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
    Accelerating Numerical Solvers for Large-Scale Simulation of Dynamical System via NeurVec. (arXiv:2208.03680v2 [cs.CE] UPDATED)
    The large-scale simulation of dynamical systems is critical in numerous scientific and engineering disciplines. However, traditional numerical solvers are limited by the choice of step sizes when estimating integration, resulting in a trade-off between accuracy and computational efficiency. To address this challenge, we introduce a deep learning-based corrector called Neural Vector (NeurVec), which can compensate for integration errors and enable larger time step sizes in simulations. Our extensive experiments on a variety of complex dynamical system benchmarks demonstrate that NeurVec exhibits remarkable generalization capability on a continuous phase space, even when trained using limited and discrete data. NeurVec significantly accelerates traditional solvers, achieving speeds tens to hundreds of times faster while maintaining high levels of accuracy and stability. Moreover, NeurVec's simple-yet-effective design, combined with its ease of implementation, has the potential to establish a new paradigm for fast-solving differential equations based on deep learning.
    Convergence Analysis of Decentralized ASGD. (arXiv:2309.03754v1 [cs.LG])
    Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models via SGD still is a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) will always be faster than mini-batch SGD. However, despite these improvements in the theoretical bounds, most ASGD convergence-rate proofs still rely on a centralized parameter server, which is prone to become a bottleneck when scaling out the gradient computations across many distributed processes. In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) which does not require partial synchronization among nodes nor restrictive network topologies. Specifically, we provide a bound of $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(QS_{avg}\epsilon^{-3/2}) + \mathcal{O}(S_{avg}\epsilon^{-1})$ for the convergence rate of DASGD, where $S_{avg}$ is the average staleness between models, $Q$ is a constant that bounds the norm of the gradients, and $\epsilon$ is a (small) error that is allowed within the bound. Furthermore, when gradients are not bounded, we prove the convergence rate of DASGD to be $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(\sqrt{\hat{S}_{avg}\hat{S}_{max}}\epsilon^{-1})$, with $\hat{S}_{max}$ and $\hat{S}_{avg}$ representing a loose version of the average and maximum staleness, respectively. Our convergence proof holds for a fixed stepsize and any non-convex, homogeneous, and L-smooth objective function. We anticipate that our results will be of high relevance for the adoption of DASGD by a broad community of researchers and developers.
    Deep Network Approximation: Beyond ReLU to Diverse Activation Functions. (arXiv:2307.06555v3 [cs.LG] UPDATED)
    This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $4N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, at the cost of slightly larger constants.
    DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models. (arXiv:2309.03883v1 [cs.CL])
    Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i.e., generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs that does not require conditioning on retrieved external knowledge nor additional fine-tuning. Our approach obtains the next-token distribution by contrasting the differences in logits obtained from projecting the later layers versus earlier layers to the vocabulary space, exploiting the fact that factual knowledge in an LLMs has generally been shown to be localized to particular transformer layers. We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts. DoLa consistently improves the truthfulness across multiple choices tasks and open-ended generation tasks, for example improving the performance of LLaMA family models on TruthfulQA by 12-17% absolute points, demonstrating its potential in making LLMs reliably generate truthful facts.
    Limitation of Characterizing Implicit Regularization by Data-independent Functions. (arXiv:2201.12198v2 [cs.LG] UPDATED)
    In recent years, understanding the implicit regularization of neural networks (NNs) has become a central task in deep learning theory. However, implicit regularization is itself not completely defined and well understood. In this work, we attempt to mathematically define and study implicit regularization. Importantly, we explore the limitations of a common approach to characterizing implicit regularization using data-independent functions. We propose two dynamical mechanisms, i.e., Two-point and One-point Overlapping mechanisms, based on which we provide two recipes for producing classes of one-hidden-neuron NNs that provably cannot be fully characterized by a type of or all data-independent functions. Following the previous works, our results further emphasize the profound data dependency of implicit regularization in general, inspiring us to study in detail the data dependency of NN implicit regularization in the future.
    Ensemble linear interpolators: The role of ensembling. (arXiv:2309.03354v1 [stat.ML])
    Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. We introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. The proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. Focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. Our results reveal the statistical roles of sketching and bagging. In particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. However, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. In stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. To further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. We also discuss several extensions.
    On Root Cause Localization and Anomaly Mitigation through Causal Inference. (arXiv:2212.04031v2 [cs.LG] UPDATED)
    Due to a wide spectrum of applications in the real world, such as security, financial surveillance, and health risk, various deep anomaly detection models have been proposed and achieved state-of-the-art performance. However, besides being effective, in practice, the practitioners would further like to know what causes the abnormal outcome and how to further fix it. In this work, we propose RootCLAM, which aims to achieve Root Cause Localization and Anomaly Mitigation from a causal perspective. Especially, we formulate anomalies caused by external interventions on the normal causal mechanism and aim to locate the abnormal features with external interventions as root causes. After that, we further propose an anomaly mitigation approach that aims to recommend mitigation actions on abnormal features to revert the abnormal outcomes such that the counterfactuals guided by the causal mechanism are normal. Experiments on three datasets show that our approach can locate the root causes and further flip the abnormal labels.
    Alzheimer Disease Detection from Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning. (arXiv:2309.03664v1 [cs.LG])
    The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer's disease (AD) as well as of 5 pathological controls have been collected and analysed by Raman spectroscopy (RS). We investigated whether the raw and preprocessed Raman spectra could be used to distinguish AD from controls. First, we applied standard Machine Learning (ML) methods obtaining unsatisfactory results. Then, we applied ML to a set of topological descriptors extracted from raw spectra, achieving a very good classification accuracy (>87%). Although our results are preliminary, they indicate that RS and topological analysis together may provide an effective combination to confirm or disprove a clinical diagnosis of AD. The next steps will include enlarging the dataset of CSF samples to validate the proposed method better and, possibly, to understand if topological data analysis could support the characterization of AD subtypes.
    Prime and Modulate Learning: Generation of forward models with signed back-propagation and environmental cues. (arXiv:2309.03825v1 [cs.LG])
    Deep neural networks employing error back-propagation for learning can suffer from exploding and vanishing gradient problems. Numerous solutions have been proposed such as normalisation techniques or limiting activation functions to linear rectifying units. In this work we follow a different approach which is particularly applicable to closed-loop learning of forward models where back-propagation makes exclusive use of the sign of the error signal to prime the learning, whilst a global relevance signal modulates the rate of learning. This is inspired by the interaction between local plasticity and a global neuromodulation. For example, whilst driving on an empty road, one can allow for slow step-wise optimisation of actions, whereas, at a busy junction, an error must be corrected at once. Hence, the error is the priming signal and the intensity of the experience is a modulating factor in the weight change. The advantages of this Prime and Modulate paradigm is twofold: it is free from normalisation and it makes use of relevant cues from the environment to enrich the learning. We present a mathematical derivation of the learning rule in z-space and demonstrate the real-time performance with a robotic platform. The results show a significant improvement in the speed of convergence compared to that of the conventional back-propagation.
    Copula Representations and Error Surface Projections for the Exclusive Or Problem. (arXiv:1907.04483v2 [cs.LG] UPDATED)
    The exclusive or (xor) function is one of the simplest examples that illustrate why nonlinear feedforward networks are superior to linear regression for machine learning applications. We review the xor representation and approximation problems and discuss their solutions in terms of probabilistic logic and associative copula functions. After briefly reviewing the specification of feedforward networks, we compare the dynamics of learned error surfaces with different activation functions such as RELU and tanh through a set of colorful three-dimensional charts. The copula representations extend xor from Boolean to real values, thereby providing a convenient way to demonstrate the concept of cross-validation on in-sample and out-sample data sets. Our approach is pedagogical and is meant to be a machine learning prolegomenon.
    Towards provably efficient quantum algorithms for large-scale machine-learning models. (arXiv:2303.03428v4 [quant-ph] UPDATED)
    Large machine learning models are revolutionary technologies of artificial intelligence whose bottlenecks include huge computational expenses, power, and time used both in the pre-training and fine-tuning process. In this work, we show that fault-tolerant quantum computing could possibly provide provably efficient resolutions for generic (stochastic) gradient descent algorithms, scaling as $\mathcal{O}(T^2 \times \text{polylog}(n))$, where $n$ is the size of the models and $T$ is the number of iterations in the training, as long as the models are both sufficiently dissipative and sparse, with small learning rates. Based on earlier efficient quantum algorithms for dissipative differential equations, we find and prove that similar algorithms work for (stochastic) gradient descent, the primary algorithm for machine learning. In practice, we benchmark instances of large machine learning models from 7 million to 103 million parameters. We find that, in the context of sparse training, a quantum enhancement is possible at the early stage of learning after model pruning, motivating a sparse parameter download and re-upload scheme. Our work shows solidly that fault-tolerant quantum algorithms could potentially contribute to most state-of-the-art, large-scale machine-learning problems.
    Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming. (arXiv:2209.08901v3 [math.OC] UPDATED)
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods.
    Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields. (arXiv:2306.12760v2 [cs.CV] UPDATED)
    Editing a local region or a specific object in a 3D scene represented by a NeRF or consistently blending a new realistic object into the scene is challenging, mainly due to the implicit nature of the scene representation. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.
    Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models. (arXiv:2307.14971v2 [cs.CV] UPDATED)
    With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. In this paper, we propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model. We propose to generate view images from different instructed poses via the cross-attention mechanism as the pre-training scheme. Generating view images has more precise supervision than its point cloud counterpart, thus assisting 3D backbones to have a finer comprehension of the geometrical structure and stereoscopic relations of the point cloud. Experimental results have proved the superiority of our proposed 3D-to-2D generative pre-training over previous pre-training methods. Our method is also effective in boosting the performance of architecture-oriented approaches, achieving state-of-the-art performance when fine-tuning on ScanObjectNN classification and ShapeNetPart segmentation tasks. Code is available at https://github.com/wangzy22/TAP.
    Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning. (arXiv:2309.03839v1 [cs.RO])
    Adaptive interfaces can help users perform sequential decision-making tasks like robotic teleoperation given noisy, high-dimensional command signals (e.g., from a brain-computer interface). Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users, but tend to be limited by the amount of data that they can collect from individual users in practice. In this paper, we propose a reinforcement learning algorithm to address this by training an interface to map raw command signals to actions using a combination of offline pre-training and online fine-tuning. To address the challenges posed by noisy command signals and sparse rewards, we develop a novel method for representing and inferring the user's long-term intent for a given trajectory. We primarily evaluate our method's ability to assist users who can only communicate through noisy, high-dimensional input channels through a user study in which 12 participants performed a simulated navigation task by using their eye gaze to modulate a 128-dimensional command signal from their webcam. The results show that our method enables successful goal navigation more often than a baseline directional interface, by learning to denoise user commands signals and provide shared autonomy assistance. We further evaluate on a simulated Sawyer pushing task with eye gaze control, and the Lunar Lander game with simulated user commands, and find that our method improves over baseline interfaces in these domains as well. Extensive ablation experiments with simulated user commands empirically motivate each component of our method.
    AnthroNet: Conditional Generation of Humans via Anthropometrics. (arXiv:2309.03812v1 [cs.CV])
    We present a novel human body model formulated by an extensive set of anthropocentric measurements, which is capable of generating a wide range of human body shapes and poses. The proposed model enables direct modeling of specific human identities through a deep generative architecture, which can produce humans in any arbitrary pose. It is the first of its kind to have been trained end-to-end using only synthetically generated data, which not only provides highly accurate human mesh representations but also allows for precise anthropometry of the body. Moreover, using a highly diverse animation library, we articulated our synthetic humans' body and hands to maximize the diversity of the learnable priors for model training. Our model was trained on a dataset of $100k$ procedurally-generated posed human meshes and their corresponding anthropometric measurements. Our synthetic data generator can be used to generate millions of unique human identities and poses for non-commercial academic research purposes.
    ReFit: A Framework for Refinement of Weakly Supervised Semantic Segmentation using Object Border Fitting for Medical Images. (arXiv:2303.07853v2 [cs.CV] UPDATED)
    Weakly Supervised Semantic Segmentation (WSSS) relying only on image-level supervision is a promising approach to deal with the need for Segmentation networks, especially for generating a large number of pixel-wise masks in a given dataset. However, most state-of-the-art image-level WSSS techniques lack an understanding of the geometric features embedded in the images since the network cannot derive any object boundary information from just image-level labels. We define a boundary here as the line separating an object and its background, or two different objects. To address this drawback, we are proposing our novel ReFit framework, which deploys state-of-the-art class activation maps combined with various post-processing techniques in order to achieve fine-grained higher-accuracy segmentation masks. To achieve this, we investigate a state-of-the-art unsupervised segmentation network that can be used to construct a boundary map, which enables ReFit to predict object locations with sharper boundaries. By applying our method to WSSS predictions, we achieved up to 10% improvement over the current state-of-the-art WSSS methods for medical imaging. The framework is open-source, to ensure that our results are reproducible, and accessible online at https://github.com/bharathprabakaran/ReFit.
    Adversarial Likelihood Estimation With One-Way Flows. (arXiv:2307.09882v2 [cs.LG] UPDATED)
    Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples. However, it has been noted that maximizing the log-likelihood within an energy-based setting can lead to an adversarial framework where the discriminator provides unnormalized density (often called energy). We further develop this perspective, incorporate importance sampling, and show that 1) Wasserstein GAN performs a biased estimate of the partition function, and we propose instead to use an unbiased estimator; and 2) when optimizing for likelihood, one must maximize generator entropy. This is hypothesized to provide a better mode coverage. Different from previous works, we explicitly compute the density of the generated samples. This is the key enabler to designing an unbiased estimator of the partition function and computation of the generator entropy term. The generator density is obtained via a new type of flow network, called one-way flow network, that is less constrained in terms of architecture, as it does not require a tractable inverse function. Our experimental results show that our method converges faster, produces comparable sample quality to GANs with similar architecture, successfully avoids over-fitting to commonly used datasets and produces smooth low-dimensional latent representations of the training data.
    Pure Exploration in Bandits with Linear Constraints. (arXiv:2306.12774v2 [cs.LG] UPDATED)
    We address the problem of identifying the optimal policy with a fixed confidence level in a multi-armed bandit setup, when \emph{the arms are subject to linear constraints}. Unlike the standard best-arm identification problem which is well studied, the optimal policy in this case may not be deterministic and could mix between several arms. This changes the geometry of the problem which we characterize via an information-theoretic lower bound. We introduce two asymptotically optimal algorithms for this setting, one based on the Track-and-Stop method and the other based on a game-theoretic approach. Both these algorithms try to track an optimal allocation based on the lower bound and computed by a weighted projection onto the boundary of a normal cone. Finally, we provide empirical results that validate our bounds and visualize how constraints change the hardness of the problem.
    Better Practices for Domain Adaptation. (arXiv:2309.03879v1 [cs.LG])
    Distribution shifts are all too common in real-world applications of machine learning. Domain adaptation (DA) aims to address this by providing various frameworks for adapting models to the deployment data without using labels. However, the domain shift scenario raises a second more subtle challenge: the difficulty of performing hyperparameter optimisation (HPO) for these adaptation algorithms without access to a labelled validation set. The unclear validation protocol for DA has led to bad practices in the literature, such as performing HPO using the target test labels when, in real-world scenarios, they are not available. This has resulted in over-optimism about DA research progress compared to reality. In this paper, we analyse the state of DA when using good evaluation practice, by benchmarking a suite of candidate validation criteria and using them to assess popular adaptation algorithms. We show that there are challenges across all three branches of domain adaptation methodology including Unsupervised Domain Adaptation (UDA), Source-Free Domain Adaptation (SFDA), and Test Time Adaptation (TTA). While the results show that realistically achievable performance is often worse than expected, they also show that using proper validation splits is beneficial, as well as showing that some previously unexplored validation metrics provide the best options to date. Altogether, our improved practices covering data, training, validation and hyperparameter optimisation form a new rigorous pipeline to improve benchmarking, and hence research progress, within this important field going forward.
    Dataset Generation and Bonobo Classification from Weakly Labelled Videos. (arXiv:2309.03671v1 [cs.CV])
    This paper presents a bonobo detection and classification pipeline built from the commonly used machine learning methods. Such application is motivated by the need to test bonobos in their enclosure using touch screen devices without human assistance. This work introduces a newly acquired dataset based on bonobo recordings generated semi-automatically. The recordings are weakly labelled and fed to a macaque detector in order to spatially detect the individual present in the video. Handcrafted features coupled with different classification algorithms and deep-learning methods using a ResNet architecture are investigated for bonobo identification. Performance is compared in terms of classification accuracy on the splits of the database using different data separation methods. We demonstrate the importance of data preparation and how a wrong data separation can lead to false good results. Finally, after a meaningful separation of the data, the best classification performance is obtained using a fine-tuned ResNet model and reaches 75% of accuracy.
    A computationally lightweight safe learning algorithm. (arXiv:2309.03672v1 [eess.SY])
    Safety is an essential asset when learning control policies for physical systems, as violating safety constraints during training can lead to expensive hardware damage. In response to this need, the field of safe learning has emerged with algorithms that can provide probabilistic safety guarantees without knowledge of the underlying system dynamics. Those algorithms often rely on Gaussian process inference. Unfortunately, Gaussian process inference scales cubically with the number of data points, limiting applicability to high-dimensional and embedded systems. In this paper, we propose a safe learning algorithm that provides probabilistic safety guarantees but leverages the Nadaraya-Watson estimator instead of Gaussian processes. For the Nadaraya-Watson estimator, we can reach logarithmic scaling with the number of data points. We provide theoretical guarantees for the estimates, embed them into a safe learning algorithm, and show numerical experiments on a simulated seven-degrees-of-freedom robot manipulator.
    Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck. (arXiv:2309.03800v1 [cs.LG])
    This work investigates the nuanced algorithm design choices for deep learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. This lower bound can be interpreted as a multi-resource tradeoff frontier: successful learning can only occur if one is sufficiently rich (large model), knowledgeable (large dataset), patient (many training iterations), or lucky (many random guesses). We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. Here, width plays the role of parallel search: it amplifies the probability of finding "lottery ticket" neurons, which learn sparse features more sample-efficiently. Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning. We demonstrate improved sample efficiency on tabular classification benchmarks by using wide, sparsely-initialized MLP models; these networks sometimes outperform tuned random forests.
    CPU frequency scheduling of real-time applications on embedded devices with temporal encoding-based deep reinforcement learning. (arXiv:2309.03779v1 [cs.LG])
    Small devices are frequently used in IoT and smart-city applications to perform periodic dedicated tasks with soft deadlines. This work focuses on developing methods to derive efficient power-management methods for periodic tasks on small devices. We first study the limitations of the existing Linux built-in methods used in small devices. We illustrate three typical workload/system patterns that are challenging to manage with Linux's built-in solutions. We develop a reinforcement-learning-based technique with temporal encoding to derive an effective DVFS governor even with the presence of the three system patterns. The derived governor uses only one performance counter, the same as the built-in Linux mechanism, and does not require an explicit task model for the workload. We implemented a prototype system on the Nvidia Jetson Nano Board and experimented with it with six applications, including two self-designed and four benchmark applications. Under different deadline constraints, our approach can quickly derive a DVFS governor that can adapt to performance requirements and outperform the built-in Linux approach in energy saving. On Mibench workloads, with performance slack ranging from 0.04 s to 0.4 s, the proposed method can save 3% - 11% more energy compared to Ondemand. AudioReg and FaceReg applications tested have 5%- 14% energy-saving improvement. We have open-sourced the implementation of our in-kernel quantized neural network engine. The codebase can be found at: https://github.com/coladog/tinyagent.
    Your Battery Is a Blast! Safeguarding Against Counterfeit Batteries with Authentication. (arXiv:2309.03607v1 [cs.CR])
    Lithium-ion (Li-ion) batteries are the primary power source in various applications due to their high energy and power density. Their market was estimated to be up to 48 billion U.S. dollars in 2022. However, the widespread adoption of Li-ion batteries has resulted in counterfeit cell production, which can pose safety hazards to users. Counterfeit cells can cause explosions or fires, and their prevalence in the market makes it difficult for users to detect fake cells. Indeed, current battery authentication methods can be susceptible to advanced counterfeiting techniques and are often not adaptable to various cells and systems. In this paper, we improve the state of the art on battery authentication by proposing two novel methodologies, DCAuth and EISthentication, which leverage the internal characteristics of each cell through Machine Learning models. Our methods automatically authenticate lithium-ion battery models and architectures using data from their regular usage without the need for any external device. They are also resilient to the most common and critical counterfeit practices and can scale to several batteries and devices. To evaluate the effectiveness of our proposed methodologies, we analyze time-series data from a total of 20 datasets that we have processed to extract meaningful features for our analysis. Our methods achieve high accuracy in battery authentication for both architectures (up to 0.99) and models (up to 0.96). Moreover, our methods offer comparable identification performances. By using our proposed methodologies, manufacturers can ensure that devices only use legitimate batteries, guaranteeing the operational state of any system and safety measures for the users.
    Temporal Inductive Path Neural Network for Temporal Knowledge Graph Reasoning. (arXiv:2309.03251v1 [cs.AI])
    Temporal Knowledge Graph (TKG) is an extension of traditional Knowledge Graph (KG) that incorporates the dimension of time. Reasoning on TKGs is a crucial task that aims to predict future facts based on historical occurrences. The key challenge lies in uncovering structural dependencies within historical subgraphs and temporal patterns. Most existing approaches model TKGs relying on entity modeling, as nodes in the graph play a crucial role in knowledge representation. However, the real-world scenario often involves an extensive number of entities, with new entities emerging over time. This makes it challenging for entity-dependent methods to cope with extensive volumes of entities, and effectively handling newly emerging entities also becomes a significant challenge. Therefore, we propose Temporal Inductive Path Neural Network (TiPNN), which models historical information in an entity-independent perspective. Specifically, TiPNN adopts a unified graph, namely history temporal graph, to comprehensively capture and encapsulate information from history. Subsequently, we utilize the defined query-aware temporal paths to model historical path information related to queries on history temporal graph for the reasoning. Extensive experiments illustrate that the proposed model not only attains significant performance enhancements but also handles inductive settings, while additionally facilitating the provision of reasoning evidence through history temporal graphs.
    Feature Enhancer Segmentation Network (FES-Net) for Vessel Segmentation. (arXiv:2309.03535v1 [eess.IV])
    Diseases such as diabetic retinopathy and age-related macular degeneration pose a significant risk to vision, highlighting the importance of precise segmentation of retinal vessels for the tracking and diagnosis of progression. However, existing vessel segmentation methods that heavily rely on encoder-decoder structures struggle to capture contextual information about retinal vessel configurations, leading to challenges in reconciling semantic disparities between encoder and decoder features. To address this, we propose a novel feature enhancement segmentation network (FES-Net) that achieves accurate pixel-wise segmentation without requiring additional image enhancement steps. FES-Net directly processes the input image and utilizes four prompt convolutional blocks (PCBs) during downsampling, complemented by a shallow upsampling approach to generate a binary mask for each class. We evaluate the performance of FES-Net on four publicly available state-of-the-art datasets: DRIVE, STARE, CHASE, and HRF. The evaluation results clearly demonstrate the superior performance of FES-Net compared to other competitive approaches documented in the existing literature.
    A Majority Invariant Approach to Patch Robustness Certification for Deep Learning Models. (arXiv:2308.00452v2 [cs.LG] UPDATED)
    Patch robustness certification ensures no patch within a given bound on a sample can manipulate a deep learning model to predict a different label. However, existing techniques cannot certify samples that cannot meet their strict bars at the classifier or patch region levels. This paper proposes MajorCert. MajorCert firstly finds all possible label sets manipulatable by the same patch region on the same sample across the underlying classifiers, then enumerates their combinations element-wise, and finally checks whether the majority invariant of all these combinations is intact to certify samples.
    Fast FixMatch: Faster Semi-Supervised Learning with Curriculum Batch Size. (arXiv:2309.03469v1 [cs.LG])
    Advances in Semi-Supervised Learning (SSL) have almost entirely closed the gap between SSL and Supervised Learning at a fraction of the number of labels. However, recent performance improvements have often come \textit{at the cost of significantly increased training computation}. To address this, we propose Curriculum Batch Size (CBS), \textit{an unlabeled batch size curriculum which exploits the natural training dynamics of deep neural networks.} A small unlabeled batch size is used in the beginning of training and is gradually increased to the end of training. A fixed curriculum is used regardless of dataset, model or number of epochs, and reduced training computations is demonstrated on all settings. We apply CBS, strong labeled augmentation, Curriculum Pseudo Labeling (CPL) \citep{FlexMatch} to FixMatch \citep{FixMatch} and term the new SSL algorithm Fast FixMatch. We perform an ablation study to show that strong labeled augmentation and/or CPL do not significantly reduce training computations, but, in synergy with CBS, they achieve optimal performance. Fast FixMatch also achieves substantially higher data utilization compared to previous state-of-the-art. Fast FixMatch achieves between $2.1\times$ - $3.4\times$ reduced training computations on CIFAR-10 with all but 40, 250 and 4000 labels removed, compared to vanilla FixMatch, while attaining the same cited state-of-the-art error rate \citep{FixMatch}. Similar results are achieved for CIFAR-100, SVHN and STL-10. Finally, Fast MixMatch achieves between $2.6\times$ - $3.3\times$ reduced training computations in federated SSL tasks and online/streaming learning SSL tasks, which further demonstrate the generializbility of Fast MixMatch to different scenarios and tasks.
    DTW+S: Shape-based Comparison of Time-series with Ordered Local Trend. (arXiv:2309.03579v1 [cs.LG])
    Measuring distance or similarity between time-series data is a fundamental aspect of many applications including classification and clustering. Existing measures may fail to capture similarities due to local trends (shapes) and may even produce misleading results. Our goal is to develop a measure that looks for similar trends occurring around similar times and is easily interpretable for researchers in applied domains. This is particularly useful for applications where time-series have a sequence of meaningful local trends that are ordered, such as in epidemics (a surge to an increase to a peak to a decrease). We propose a novel measure, DTW+S, which creates an interpretable "closeness-preserving" matrix representation of the time-series, where each column represents local trends, and then it applies Dynamic Time Warping to compute distances between these matrices. We present a theoretical analysis that supports the choice of this representation. We demonstrate the utility of DTW+S in ensemble building and clustering of epidemic curves. We also demonstrate that our approach results in better classification compared to Dynamic Time Warping for a class of datasets, particularly when local trends rather than scale play a decisive role.
    GraPhSyM: Graph Physical Synthesis Model. (arXiv:2308.03944v2 [cs.LG] UPDATED)
    In this work, we introduce GraPhSyM, a Graph Attention Network (GATv2) model for fast and accurate estimation of post-physical synthesis circuit delay and area metrics from pre-physical synthesis circuit netlists. Once trained, GraPhSyM provides accurate visibility of final design metrics to early EDA stages, such as logic synthesis, without running the slow physical synthesis flow, enabling global co-optimization across stages. Additionally, the swift and precise feedback provided by GraPhSyM is instrumental for machine-learning-based EDA optimization frameworks. Given a gate-level netlist of a circuit represented as a graph, GraPhSyM utilizes graph structure, connectivity, and electrical property features to predict the impact of physical synthesis transformations such as buffer insertion and gate sizing. When trained on a dataset of 6000 prefix adder designs synthesized at an aggressive delay target, GraPhSyM can accurately predict the post-synthesis delay (98.3%) and area (96.1%) metrics of unseen adders with a fast 0.22s inference time. Furthermore, we illustrate the compositionality of GraPhSyM by employing the model trained on a fixed delay target to accurately anticipate post-synthesis metrics at a variety of unseen delay targets. Lastly, we report promising generalization capabilities of the GraPhSyM model when it is evaluated on circuits different from the adders it was exclusively trained on. The results show the potential for GraPhSyM to serve as a powerful tool for advanced optimization techniques and as an oracle for EDA machine learning frameworks.
    A State Representation for Diminishing Rewards. (arXiv:2309.03710v1 [cs.LG])
    A common setting in multitask reinforcement learning (RL) demands that an agent rapidly adapt to various stationary reward functions randomly sampled from a fixed distribution. In such situations, the successor representation (SR) is a popular framework which supports rapid policy evaluation by decoupling a policy's expected discounted, cumulative state occupancies from a specific reward function. However, in the natural world, sequential tasks are rarely independent, and instead reflect shifting priorities based on the availability and subjective perception of rewarding stimuli. Reflecting this disjunction, in this paper we study the phenomenon of diminishing marginal utility and introduce a novel state representation, the $\lambda$ representation ($\lambda$R) which, surprisingly, is required for policy evaluation in this setting and which generalizes the SR as well as several other state representations from the literature. We establish the $\lambda$R's formal properties and examine its normative advantages in the context of machine learning, as well as its usefulness for studying natural behaviors, particularly foraging.
    DeepAD: A Robust Deep Learning Model of Alzheimer's Disease Progression for Real-World Clinical Applications. (arXiv:2203.09096v5 [cs.LG] UPDATED)
    The ability to predict the future trajectory of a patient is a key step toward the development of therapeutics for complex diseases such as Alzheimer's disease (AD). However, most machine learning approaches developed for prediction of disease progression are either single-task or single-modality models, which can not be directly adopted to our setting involving multi-task learning with high dimensional images. Moreover, most of those approaches are trained on a single dataset (i.e. cohort), which can not be generalized to other cohorts. We propose a novel multimodal multi-task deep learning model to predict AD progression by analyzing longitudinal clinical and neuroimaging data from multiple cohorts. Our proposed model integrates high dimensional MRI features from a 3D convolutional neural network with other data modalities, including clinical and demographic information, to predict the future trajectory of patients. Our model employs an adversarial loss to alleviate the study-specific imaging bias, in particular the inter-study domain shifts. In addition, a Sharpness-Aware Minimization (SAM) optimization technique is applied to further improve model generalization. The proposed model is trained and tested on various datasets in order to evaluate and validate the results. Our results showed that 1) our model yields significant improvement over the baseline models, and 2) models using extracted neuroimaging features from 3D convolutional neural network outperform the same models when applied to MRI-derived volumetric features.
    Scalable Learning of Intrusion Responses through Recursive Decomposition. (arXiv:2309.03292v1 [eess.SY])
    We study automated intrusion response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed stochastic game. To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium. Solutions proposed in previous work prove the feasibility of this approach for small infrastructures but do not scale to realistic scenarios due to the exponential growth in computational complexity with the infrastructure size. We address this problem by introducing a method that recursively decomposes the game into subgames which can be solved in parallel. Applying optimal stopping theory we show that the best response strategies in these subgames exhibit threshold structures, which allows us to compute them efficiently. To solve the decomposed game we introduce an algorithm called Decompositional Fictitious Self-Play (DFSP), which learns Nash equilibria through stochastic approximation. We evaluate the learned strategies in an emulation environment where real intrusions and response actions can be executed. The results show that the learned strategies approximate an equilibrium and that DFSP significantly outperforms a state-of-the-art algorithm for a realistic infrastructure configuration.
    Implicit Design Choices and Their Impact on Emotion Recognition Model Development and Evaluation. (arXiv:2309.03238v1 [cs.LG])
    Emotion recognition is a complex task due to the inherent subjectivity in both the perception and production of emotions. The subjectivity of emotions poses significant challenges in developing accurate and robust computational models. This thesis examines critical facets of emotion recognition, beginning with the collection of diverse datasets that account for psychological factors in emotion production. To handle the challenge of non-representative training data, this work collects the Multimodal Stressed Emotion dataset, which introduces controlled stressors during data collection to better represent real-world influences on emotion production. To address issues with label subjectivity, this research comprehensively analyzes how data augmentation techniques and annotation schemes impact emotion perception and annotator labels. It further handles natural confounding variables and variations by employing adversarial networks to isolate key factors like stress from learned emotion representations during model training. For tackling concerns about leakage of sensitive demographic variables, this work leverages adversarial learning to strip sensitive demographic information from multimodal encodings. Additionally, it proposes optimized sociological evaluation metrics aligned with cost-effective, real-world needs for model testing. This research advances robust, practical emotion recognition through multifaceted studies of challenges in datasets, labels, modeling, demographic and membership variable encoding in representations, and evaluation. The groundwork has been laid for cost-effective, generalizable emotion recognition models that are less likely to encode sensitive demographic information.
    A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation. (arXiv:2309.02539v2 [eess.AS] UPDATED)
    Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue stem, the music stem, and the effects stem from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psycho-acoustically motivated frequency scales were used to inform the band definitions which are now defined with redundancy for more reliable feature extraction. A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed. We additionally exploit the information-sharing property of a common-encoder setup to reduce computational complexity during both training and inference, improve separation performance for hard-to-generalize classes of sounds, and allow flexibility during inference time with easily detachable decoders. Our best model sets the state of the art on the Divide and Remaster dataset with performance above the ideal ratio mask for the dialogue stem.
    CenTime: Event-Conditional Modelling of Censoring in Survival Analysis. (arXiv:2309.03851v1 [cs.LG])
    Survival analysis is a valuable tool for estimating the time until specific events, such as death or cancer recurrence, based on baseline observations. This is particularly useful in healthcare to prognostically predict clinically important events based on patient data. However, existing approaches often have limitations; some focus only on ranking patients by survivability, neglecting to estimate the actual event time, while others treat the problem as a classification task, ignoring the inherent time-ordered structure of the events. Furthermore, the effective utilization of censored samples - training data points where the exact event time is unknown - is essential for improving the predictive accuracy of the model. In this paper, we introduce CenTime, a novel approach to survival analysis that directly estimates the time to event. Our method features an innovative event-conditional censoring mechanism that performs robustly even when uncensored data is scarce. We demonstrate that our approach forms a consistent estimator for the event model parameters, even in the absence of uncensored data. Furthermore, CenTime is easily integrated with deep learning models with no restrictions on batch size or the number of uncensored samples. We compare our approach with standard survival analysis methods, including the Cox proportional-hazard model and DeepHit. Our results indicate that CenTime offers state-of-the-art performance in predicting time-to-death while maintaining comparable ranking performance. Our implementation is publicly available at https://github.com/ahmedhshahin/CenTime.
    VLUCI: Variational Learning of Unobserved Confounders for Counterfactual Inference. (arXiv:2308.00904v2 [cs.LG] UPDATED)
    Causal inference plays a vital role in diverse domains like epidemiology, healthcare, and economics. De-confounding and counterfactual prediction in observational data has emerged as a prominent concern in causal inference research. While existing models tackle observed confounders, the presence of unobserved confounders remains a significant challenge, distorting causal inference and impacting counterfactual outcome accuracy. To address this, we propose a novel variational learning model of unobserved confounders for counterfactual inference (VLUCI), which generates the posterior distribution of unobserved confounders. VLUCI relaxes the unconfoundedness assumption often overlooked by most causal inference methods. By disentangling observed and unobserved confounders, VLUCI constructs a doubly variational inference model to approximate the distribution of unobserved confounders, which are used for inferring more accurate counterfactual outcomes. Extensive experiments on synthetic and semi-synthetic datasets demonstrate VLUCI's superior performance in inferring unobserved confounders. It is compatible with state-of-the-art counterfactual inference models, significantly improving inference accuracy at both group and individual levels. Additionally, VLUCI provides confidence intervals for counterfactual outcomes, aiding decision-making in risk-sensitive domains. We further clarify the considerations when applying VLUCI to cases where unobserved confounders don't strictly conform to our model assumptions using the public IHDP dataset as an example, highlighting the practical advantages of VLUCI.
    DiFaReli: Diffusion Face Relighting. (arXiv:2304.09479v3 [cs.CV] UPDATED)
    We present a novel approach to single-view face relighting in the wild. Handling non-diffuse effects, such as global illumination or cast shadows, has long been a challenge in face relighting. Prior work often assumes Lambertian surfaces, simplified lighting models or involves estimating 3D shape, albedo, or a shadow map. This estimation, however, is error-prone and requires many training examples with lighting ground truth to generalize well. Our work bypasses the need for accurate estimation of intrinsic components and can be trained solely on 2D images without any light stage data, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We also propose a novel conditioning technique that eases the modeling of the complex interaction between light and geometry by using a rendered shading reference to spatially modulate the DDIM. We achieve state-of-the-art performance on standard benchmark Multi-PIE and can photorealistically relight in-the-wild images. Please visit our page: https://diffusion-face-relighting.github.io
    Examining the Effectiveness of Chatbots in Gathering Family History Information in Comparison to the Standard In-Person Interview-Based Approach. (arXiv:2309.03223v1 [cs.HC])
    One of the most common things that a genealogist is tasked with is the gathering of a person's initial family history, normally via in-person interviews or with the use of a platform such as ancestry.com, as this can provide a strong foundation upon which a genealogist may build. However, the ability to conduct these interviews can often be hindered by both geographical constraints and the technical proficiency of the interviewee, as the interviewee in these types of interviews is most often an elderly person with a lower than average level of technical proficiency. With this in mind, this study presents what we believe, based on prior research, to be the first chatbot geared entirely towards the gathering of family histories, and explores the viability of utilising such a chatbot by comparing the performance and usability of such a method with the aforementioned alternatives. With a chatbot-based approach, we show that, though the average time taken to conduct an interview may be longer than if the user had used ancestry.com or participated in an in-person interview, the number of mistakes made and the level of confusion from the user regarding the UI and process required is lower than the other two methods. Note that the final metric regarding the user's confusion is not applicable for the in-person interview sessions due to its lack of a UI. With refinement, we believe this use of a chatbot could be a valuable tool for genealogists, especially when dealing with interviewees who are based in other countries where it is not possible to conduct an in-person interview.
    Neural lasso: a unifying approach of lasso and neural networks. (arXiv:2309.03770v1 [stat.ML])
    In recent years, there is a growing interest in combining techniques attributed to the areas of Statistics and Machine Learning in order to obtain the benefits of both approaches. In this article, the statistical technique lasso for variable selection is represented through a neural network. It is observed that, although both the statistical approach and its neural version have the same objective function, they differ due to their optimization. In particular, the neural version is usually optimized in one-step using a single validation set, while the statistical counterpart uses a two-step optimization based on cross-validation. The more elaborated optimization of the statistical method results in more accurate parameter estimation, especially when the training set is small. For this reason, a modification of the standard approach for training neural networks, that mimics the statistical framework, is proposed. During the development of the above modification, a new optimization algorithm for identifying the significant variables emerged. Experimental results, using synthetic and real data sets, show that this new optimization algorithm achieves better performance than any of the three previous optimization approaches.
    Models of human preference for learning reward functions. (arXiv:2206.02231v3 [cs.LG] UPDATED)
    The utility of reinforcement learning is limited by the alignment of reward functions with the interests of human stakeholders. One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments, a type of reinforcement learning from human feedback (RLHF). These human preferences are typically assumed to be informed solely by partial return, the sum of rewards along each segment. We find this assumption to be flawed and propose modeling human preferences instead as informed by each segment's regret, a measure of a segment's deviation from optimal decision-making. Given infinitely many preferences generated according to regret, we prove that we can identify a reward function equivalent to the reward function that generated those preferences, and we prove that the previous partial return model lacks this identifiability property in multiple contexts. We empirically show that our proposed regret preference model outperforms the partial return preference model with finite training data in otherwise the same setting. Additionally, we find that our proposed regret preference model better predicts real human preferences and also learns reward functions from these preferences that lead to policies that are better human-aligned. Overall, this work establishes that the choice of preference model is impactful, and our proposed regret preference model provides an improvement upon a core assumption of recent research. We have open sourced our experimental code, the human preferences dataset we gathered, and our training and preference elicitation interfaces for gathering a such a dataset.
    Internet Explorer: Targeted Representation Learning on the Open Web. (arXiv:2302.14051v2 [cs.LG] UPDATED)
    Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. These general-purpose models only capture the knowledge within their pre-training datasets, which are tiny, out-of-date snapshots of the Internet -- where billions of images are uploaded each day. We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset. It cycles between searching for images on the Internet with text queries, self-supervised training on downloaded images, determining which images were useful, and prioritizing what to search for next. We evaluate Internet Explorer across several datasets and show that it outperforms or matches CLIP oracle performance by using just a single GPU desktop to actively query the Internet for 30--40 hours. Results, visualizations, and videos at https://internet-explorer-ssl.github.io/
    Off-policy Evaluation in Doubly Inhomogeneous Environments. (arXiv:2306.08719v2 [stat.ME] UPDATED)
    This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated. To handle the ``double inhomogeneities", we propose a class of latent factor models for the reward and observation transition functions, under which we develop a general OPE framework that consists of both model-based and model-free approaches. To our knowledge, this is the first paper that develops statistically sound OPE methods in offline RL with double inhomogeneities. It contributes to a deeper understanding of OPE in environments, where standard RL assumptions are not met, and provides several practical approaches in these settings. We establish the theoretical properties of the proposed value estimators and empirically show that our approach outperforms competing methods that ignore either temporal nonstationarity or individual heterogeneity. Finally, we illustrate our method on a data set from the Medical Information Mart for Intensive Care.
    Evaluation of Reinforcement Learning Techniques for Trading on a Diverse Portfolio. (arXiv:2309.03202v1 [q-fin.TR])
    This work seeks to answer key research questions regarding the viability of reinforcement learning over the S&P 500 index. The on-policy techniques of Value Iteration (VI) and State-action-reward-state-action (SARSA) are implemented along with the off-policy technique of Q-Learning. The models are trained and tested on a dataset comprising multiple years of stock market data from 2000-2023. The analysis presents the results and findings from training and testing the models using two different time periods: one including the COVID-19 pandemic years and one excluding them. The results indicate that including market data from the COVID-19 period in the training dataset leads to superior performance compared to the baseline strategies. During testing, the on-policy approaches (VI and SARSA) outperform Q-learning, highlighting the influence of bias-variance tradeoff and the generalization capabilities of simpler policies. However, it is noted that the performance of Q-learning may vary depending on the stability of future market conditions. Future work is suggested, including experiments with updated Q-learning policies during testing and trading diverse individual stocks. Additionally, the exploration of alternative economic indicators for training the models is proposed.
    How adversarial attacks can disrupt seemingly stable accurate classifiers. (arXiv:2309.03665v1 [cs.LG])
    Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required.
    LB-SimTSC: An Efficient Similarity-Aware Graph Neural Network for Semi-Supervised Time Series Classification. (arXiv:2301.04838v3 [cs.LG] UPDATED)
    Time series classification is an important data mining task that has received a lot of interest in the past two decades. Due to the label scarcity in practice, semi-supervised time series classification with only a few labeled samples has become popular. Recently, Similarity-aware Time Series Classification (SimTSC) is proposed to address this problem by using a graph neural network classification model on the graph generated from pairwise Dynamic Time Warping (DTW) distance of batch data. It shows excellent accuracy and outperforms state-of-the-art deep learning models in several few-label settings. However, since SimTSC relies on pairwise DTW distances, the quadratic complexity of DTW limits its usability to only reasonably sized datasets. To address this challenge, we propose a new efficient semi-supervised time series classification technique, LB-SimTSC, with a new graph construction module. Instead of using DTW, we propose to utilize a lower bound of DTW, LB_Keogh, to approximate the dissimilarity between instances in linear time, while retaining the relative proximity relationships one would have obtained via computing DTW. We construct the pairwise distance matrix using LB_Keogh and build a graph for the graph neural network. We apply this approach to the ten largest datasets from the well-known UCR time series classification archive. The results demonstrate that this approach can be up to 104x faster than SimTSC when constructing the graph on large datasets without significantly decreasing classification accuracy.  ( 3 min )
    Learning a Patent-Informed Biomedical Knowledge Graph Reveals Technological Potential of Drug Repositioning Candidates. (arXiv:2309.03227v1 [cs.AI])
    Drug repositioning-a promising strategy for discovering new therapeutic uses for existing drugs-has been increasingly explored in the computational science literature using biomedical databases. However, the technological potential of drug repositioning candidates has often been overlooked. This study presents a novel protocol to comprehensively analyse various sources such as pharmaceutical patents and biomedical databases, and identify drug repositioning candidates with both technological potential and scientific evidence. To this end, first, we constructed a scientific biomedical knowledge graph (s-BKG) comprising relationships between drugs, diseases, and genes derived from biomedical databases. Our protocol involves identifying drugs that exhibit limited association with the target disease but are closely located in the s-BKG, as potential drug candidates. We constructed a patent-informed biomedical knowledge graph (p-BKG) by adding pharmaceutical patent information. Finally, we developed a graph embedding protocol to ascertain the structure of the p-BKG, thereby calculating the relevance scores of those candidates with target disease-related patents to evaluate their technological potential. Our case study on Alzheimer's disease demonstrates its efficacy and feasibility, while the quantitative outcomes and systematic methods are expected to bridge the gap between computational discoveries and successful market applications in drug repositioning research.  ( 2 min )
    Insights Into the Inner Workings of Transformer Models for Protein Function Prediction. (arXiv:2309.03631v1 [cs.LG])
    Motivation: We explored how explainable AI (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. Results: The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g., transmembrane regions, active sites) across many proteins. Availability and Implementation: Source code can be accessed at https://github.com/markuswenzel/xai-proteins .  ( 2 min )
    Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples. (arXiv:2309.03847v1 [stat.ML])
    We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to estimate a mixture of $k$ Gaussians up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover [BKSW19] with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover [AAL21].  ( 2 min )
    MVD:A Novel Methodology and Dataset for Acoustic Vehicle Type Classification. (arXiv:2309.03544v1 [cs.SD])
    Rising urban populations have led to a surge in vehicle use and made traffic monitoring and management indispensable. Acoustic traffic monitoring (ATM) offers a cost-effective and efficient alternative to more computationally expensive methods of monitoring traffic such as those involving computer vision technologies. In this paper, we present MVD and MVDA: two open datasets for the development of acoustic traffic monitoring and vehicle-type classification algorithms, which contain audio recordings of moving vehicles. The dataset contain four classes- Trucks, Cars, Motorbikes, and a No-vehicle class. Additionally, we propose a novel and efficient way to accurately classify these acoustic signals using cepstrum and spectrum based local and global audio features, and a multi-input neural network. Experimental results show that our methodology improves upon the established baselines of previous works and achieves an accuracy of 91.98% and 96.66% on MVD and MVDA Datasets, respectively. Finally, the proposed model was deployed through an Android application to make it accessible for testing and demonstrate its efficacy.  ( 2 min )
    DiffDefense: Defending against Adversarial Attacks via Diffusion Models. (arXiv:2309.03702v1 [cs.LG])
    This paper presents a novel reconstruction method that leverages Diffusion Models to protect machine learning classifiers against adversarial attacks, all without requiring any modifications to the classifiers themselves. The susceptibility of machine learning models to minor input perturbations renders them vulnerable to adversarial attacks. While diffusion-based methods are typically disregarded for adversarial defense due to their slow reverse process, this paper demonstrates that our proposed method offers robustness against adversarial threats while preserving clean accuracy, speed, and plug-and-play compatibility. Code at: https://github.com/HondamunigePrasannaSilva/DiffDefence.  ( 2 min )
    Kernelized Concept Erasure. (arXiv:2201.12191v4 [cs.LG] UPDATED)
    The representation space of neural models for textual data emerges in an unsupervised manner during training. Understanding how those representations encode human-interpretable concepts is a fundamental problem. One prominent approach for the identification of concepts in neural representations is searching for a linear subspace whose erasure prevents the prediction of the concept from the representations. However, while many linear erasure algorithms are tractable and interpretable, neural networks do not necessarily represent concepts in a linear manner. To identify non-linearly encoded concepts, we propose a kernelization of a linear minimax game for concept erasure. We demonstrate that it is possible to prevent specific non-linear adversaries from predicting the concept. However, the protection does not transfer to different nonlinear adversaries. Therefore, exhaustively erasing a non-linearly encoded concept remains an open problem.  ( 2 min )
    Retail store customer behavior analysis system: Design and Implementation. (arXiv:2309.03232v1 [cs.LG])
    Understanding customer behavior in retail stores plays a crucial role in improving customer satisfaction by adding personalized value to services. Behavior analysis reveals both general and detailed patterns in the interaction of customers with a store items and other people, providing store managers with insight into customer preferences. Several solutions aim to utilize this data by recognizing specific behaviors through statistical visualization. However, current approaches are limited to the analysis of small customer behavior sets, utilizing conventional methods to detect behaviors. They do not use deep learning techniques such as deep neural networks, which are powerful methods in the field of computer vision. Furthermore, these methods provide limited figures when visualizing the behavioral data acquired by the system. In this study, we propose a framework that includes three primary parts: mathematical modeling of customer behaviors, behavior analysis using an efficient deep learning based system, and individual and group behavior visualization. Each module and the entire system were validated using data from actual situations in a retail store.  ( 2 min )
    PGFed: Personalize Each Client's Global Objective for Federated Learning. (arXiv:2212.01448v2 [cs.LG] UPDATED)
    Personalized federated learning has received an upsurge of attention due to the mediocre performance of conventional federated learning (FL) over heterogeneous data. Unlike conventional FL which trains a single global consensus model, personalized FL allows different models for different clients. However, existing personalized FL algorithms only implicitly transfer the collaborative knowledge across the federation by embedding the knowledge into the aggregated model or regularization. We observed that this implicit knowledge transfer fails to maximize the potential of each client's empirical risk toward other clients. Based on our observation, in this work, we propose Personalized Global Federated Learning (PGFed), a novel personalized FL framework that enables each client to personalize its own global objective by explicitly and adaptively aggregating the empirical risks of itself and other clients. To avoid massive (O(N^2)) communication overhead and potential privacy leakage while achieving this, each client's risk is estimated through a first-order approximation for other clients' adaptive risk aggregation. On top of PGFed, we develop a momentum upgrade, dubbed PGFedMo, to more efficiently utilize clients' empirical risks. Our extensive experiments on four datasets under different federated settings show consistent improvements of PGFed over previous state-of-the-art methods. The code is publicly available at https://github.com/ljaiverson/pgfed.  ( 2 min )
    Auto-SDE: Learning effective reduced dynamics from data-driven stochastic dynamical systems. (arXiv:2205.04151v2 [stat.ML] UPDATED)
    Multiscale stochastic dynamical systems have been widely adopted to scientific and engineering problems due to their capability of depicting complex phenomena in many real world applications. This work is devoted to investigating the effective reduced dynamics for a slow-fast stochastic dynamical system. Given observation data on a short-term period satisfying some unknown slow-fast stochastic system, we propose a novel algorithm including a neural network called Auto-SDE to learn invariant slow manifold. Our approach captures the evolutionary nature of a series of time-dependent autoencoder neural networks with the loss constructed from a discretized stochastic differential equation. Our algorithm is also proved to be accurate, stable and effective through numerical experiments under various evaluation metrics.  ( 2 min )
    A Function Interpretation Benchmark for Evaluating Interpretability Methods. (arXiv:2309.03886v1 [cs.CL])
    Labeling neural network submodules with human-legible descriptions is useful for many downstream tasks: such descriptions can surface failures, guide interventions, and perhaps even explain important model behaviors. To date, most mechanistic descriptions of trained networks have involved small models, narrowly delimited phenomena, and large amounts of human labor. Labeling all human-interpretable sub-computations in models of increasing size and complexity will almost certainly require tools that can generate and validate descriptions automatically. Recently, techniques that use learned models in-the-loop for labeling have begun to gain traction, but methods for evaluating their efficacy are limited and ad-hoc. How should we validate and compare open-ended labeling tools? This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating the building blocks of automated interpretability methods. FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate. The functions are procedurally constructed across textual and numeric domains, and involve a range of real-world complexities, including noise, composition, approximation, and bias. We evaluate new and existing methods that use language models (LMs) to produce code-based and language descriptions of function behavior. We find that an off-the-shelf LM augmented with only black-box access to functions can sometimes infer their structure, acting as a scientist by forming hypotheses, proposing experiments, and updating descriptions in light of new data. However, LM-based descriptions tend to capture global function behavior and miss local corruptions. These results show that FIND will be useful for characterizing the performance of more sophisticated interpretability methods before they are applied to real-world models.  ( 3 min )
    Early warning via transitions in latent stochastic dynamical systems. (arXiv:2309.03842v1 [stat.ML])
    Early warnings for dynamical transitions in complex systems or high-dimensional observation data are essential in many real world applications, such as gene mutation, brain diseases, natural disasters, financial crises, and engineering reliability. To effectively extract early warning signals, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in low-dimensional manifold. Applying the methodology to authentic electroencephalogram (EEG) data, we successfully find the appropriate effective coordinates, and derive early warning signals capable of detecting the tipping point during the state transition. Our method bridges the latent dynamics with the original dataset. The framework is validated to be accurate and effective through numerical experiments, in terms of density and transition probability. It is shown that the second coordinate holds meaningful information for critical transition in various evaluation metrics.  ( 2 min )
    XGen-7B Technical Report. (arXiv:2309.03450v1 [cs.CL])
    Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.  ( 2 min )
    ArtHDR-Net: Perceptually Realistic and Accurate HDR Content Creation. (arXiv:2309.03827v1 [cs.CV])
    High Dynamic Range (HDR) content creation has become an important topic for modern media and entertainment sectors, gaming and Augmented/Virtual Reality industries. Many methods have been proposed to recreate the HDR counterparts of input Low Dynamic Range (LDR) images/videos given a single exposure or multi-exposure LDRs. The state-of-the-art methods focus primarily on the preservation of the reconstruction's structural similarity and the pixel-wise accuracy. However, these conventional approaches do not emphasize preserving the artistic intent of the images in terms of human visual perception, which is an essential element in media, entertainment and gaming. In this paper, we attempt to study and fill this gap. We propose an architecture called ArtHDR-Net based on a Convolutional Neural Network that uses multi-exposed LDR features as input. Experimental results show that ArtHDR-Net can achieve state-of-the-art performance in terms of the HDR-VDP-2 score (i.e., mean opinion score index) while reaching competitive performance in terms of PSNR and SSIM.  ( 2 min )
    Subgraph-based Tight Frames on Graphs with Compact Supports and Vanishing Moments. (arXiv:2309.03537v1 [eess.SP])
    In this work, we proposed a novel and general method to construct tight frames on graphs with compact supports based on a series of hierarchical partitions. Starting from our abstract construction that generalizes previous methods based on partition trees, we are able to flexibly incorporate subgraph Laplacians into our design of graph frames. Consequently, our general methods permit adjusting the (subgraph) vanishing moments of the framelets and extra properties, such as directionality, for efficiently representing graph signals with path-like supports. Several variants are explicitly defined and tested. Experimental results show our proposed graph frames perform superiorly in non-linear approximation tasks.  ( 2 min )
    Graph Theory Applications in Advanced Geospatial Research. (arXiv:2309.03249v1 [cs.LG])
    Geospatial sciences include a wide range of applications, from environmental monitoring transportation to infrastructure planning, as well as location-based analysis and services. Graph theory algorithms in mathematics have emerged as indispensable tools in these domains due to their capability to model and analyse spatial relationships efficiently. This technical report explores the applications of graph theory algorithms in geospatial sciences, highlighting their role in network analysis, spatial connectivity, geographic information systems, and various other spatial problem-solving scenarios. It provides a comprehensive idea about the key concepts and algorithms of graph theory that assist the modelling processes. The report provides insights into the practical significance of graph theory in addressing real-world geospatial challenges and opportunities. It lists the extensive research, innovative technologies and methodologies implemented in this field.  ( 2 min )
    Cross-Task Attention Network: Improving Multi-Task Learning for Medical Imaging Applications. (arXiv:2309.03837v1 [cs.CV])
    Multi-task learning (MTL) is a powerful approach in deep learning that leverages the information from multiple tasks during training to improve model performance. In medical imaging, MTL has shown great potential to solve various tasks. However, existing MTL architectures in medical imaging are limited in sharing information across tasks, reducing the potential performance improvements of MTL. In this study, we introduce a novel attention-based MTL framework to better leverage inter-task interactions for various tasks from pixel-level to image-level predictions. Specifically, we propose a Cross-Task Attention Network (CTAN) which utilizes cross-task attention mechanisms to incorporate information by interacting across tasks. We validated CTAN on four medical imaging datasets that span different domains and tasks including: radiation treatment planning prediction using planning CT images of two different target cancers (Prostate, OpenKBP); pigmented skin lesion segmentation and diagnosis using dermatoscopic images (HAM10000); and COVID-19 diagnosis and severity prediction using chest CT scans (STOIC). Our study demonstrates the effectiveness of CTAN in improving the accuracy of medical imaging tasks. Compared to standard single-task learning (STL), CTAN demonstrated a 4.67% improvement in performance and outperformed both widely used MTL baselines: hard parameter sharing (HPS) with an average performance improvement of 3.22%; and multi-task attention network (MTAN) with a relative decrease of 5.38%. These findings highlight the significance of our proposed MTL framework in solving medical imaging tasks and its potential to improve their accuracy across domains.  ( 3 min )
    Companion Animal Disease Diagnostics based on Literal-aware Medical Knowledge Graph Representation Learning. (arXiv:2309.03219v1 [cs.AI])
    Knowledge graph (KG) embedding has been used to benefit the diagnosis of animal diseases by analyzing electronic medical records (EMRs), such as notes and veterinary records. However, learning representations to capture entities and relations with literal information in KGs is challenging as the KGs show heterogeneous properties and various types of literal information. Meanwhile, the existing methods mostly aim to preserve graph structures surrounding target nodes without considering different types of literals, which could also carry significant information. In this paper, we propose a knowledge graph embedding model for the efficient diagnosis of animal diseases, which could learn various types of literal information and graph structure and fuse them into unified representations, namely LiteralKG. Specifically, we construct a knowledge graph that is built from EMRs along with literal information collected from various animal hospitals. We then fuse different types of entities and node feature information into unified vector representations through gate networks. Finally, we propose a self-supervised learning task to learn graph structure in pretext tasks and then towards various downstream tasks. Experimental results on link prediction tasks demonstrate that our model outperforms the baselines that consist of state-of-the-art models. The source code is available at https://github.com/NSLab-CUK/LiteralKG.  ( 2 min )
    EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System. (arXiv:2309.03246v1 [cs.LG])
    The Cancer Registry of Norway (CRN) collects information on cancer patients by receiving cancer messages from different medical entities (e.g., medical labs, and hospitals) in Norway. Such messages are validated by an automated cancer registry system: GURI. Its correct operation is crucial since it lays the foundation for cancer research and provides critical cancer-related statistics to its stakeholders. Constructing a cyber-cyber digital twin (CCDT) for GURI can facilitate various experiments and advanced analyses of the operational state of GURI without requiring intensive interactions with the real system. However, GURI constantly evolves due to novel medical diagnostics and treatment, technological advances, etc. Accordingly, CCDT should evolve as well to synchronize with GURI. A key challenge of achieving such synchronization is that evolving CCDT needs abundant data labelled by the new GURI. To tackle this challenge, we propose EvoCLINICAL, which considers the CCDT developed for the previous version of GURI as the pretrained model and fine-tunes it with the dataset labelled by querying a new GURI version. EvoCLINICAL employs a genetic algorithm to select an optimal subset of cancer messages from a candidate dataset and query GURI with it. We evaluate EvoCLINICAL on three evolution processes. The precision, recall, and F1 score are all greater than 91%, demonstrating the effectiveness of EvoCLINICAL. Furthermore, we replace the active learning part of EvoCLINICAL with random selection to study the contribution of transfer learning to the overall performance of EvoCLINICAL. Results show that employing active learning in EvoCLINICAL increases its performances consistently.  ( 3 min )
  • Open

    Deep Metric Learning with Chance Constraints. (arXiv:2209.09060v3 [cs.CV] CROSS LISTED)
    Deep metric learning (DML) aims to minimize empirical expected loss of the pairwise intra-/inter- class proximity violations in the embedding space. We relate DML to feasibility problem of finite chance constraints. We show that minimizer of proxy-based DML satisfies certain chance constraints, and that the worst case generalization performance of the proxy-based methods can be characterized by the radius of the smallest ball around a class proxy to cover the entire domain of the corresponding class samples, suggesting multiple proxies per class helps performance. To provide a scalable algorithm as well as exploiting more proxies, we consider the chance constraints implied by the minimizers of proxy-based DML instances and reformulate DML as finding a feasible point in intersection of such constraints, resulting in a problem to be approximately solved by iterative projections. Simply put, we repeatedly train a regularized proxy-based loss and re-initialize the proxies with the embeddings of the deliberately selected new samples. We applied our method with 4 well-accepted DML losses and show the effectiveness with extensive evaluations on 4 popular DML benchmarks. Code is available at: https://github.com/yetigurbuz/ccp-dml
    Neural lasso: a unifying approach of lasso and neural networks. (arXiv:2309.03770v1 [stat.ML])
    In recent years, there is a growing interest in combining techniques attributed to the areas of Statistics and Machine Learning in order to obtain the benefits of both approaches. In this article, the statistical technique lasso for variable selection is represented through a neural network. It is observed that, although both the statistical approach and its neural version have the same objective function, they differ due to their optimization. In particular, the neural version is usually optimized in one-step using a single validation set, while the statistical counterpart uses a two-step optimization based on cross-validation. The more elaborated optimization of the statistical method results in more accurate parameter estimation, especially when the training set is small. For this reason, a modification of the standard approach for training neural networks, that mimics the statistical framework, is proposed. During the development of the above modification, a new optimization algorithm for identifying the significant variables emerged. Experimental results, using synthetic and real data sets, show that this new optimization algorithm achieves better performance than any of the three previous optimization approaches.
    Enhancing Pipeline-Based Conversational Agents with Large Language Models. (arXiv:2309.03748v1 [cs.CL])
    The latest advancements in AI and deep learning have led to a breakthrough in large language model (LLM)-based agents such as GPT-4. However, many commercial conversational agent development tools are pipeline-based and have limitations in holding a human-like conversation. This paper investigates the capabilities of LLMs to enhance pipeline-based conversational agents during two phases: 1) in the design and development phase and 2) during operations. In 1) LLMs can aid in generating training data, extracting entities and synonyms, localization, and persona design. In 2) LLMs can assist in contextualization, intent classification to prevent conversational breakdown and handle out-of-scope questions, auto-correcting utterances, rephrasing responses, formulating disambiguation questions, summarization, and enabling closed question-answering capabilities. We conducted informal experiments with GPT-4 in the private banking domain to demonstrate the scenarios above with a practical example. Companies may be hesitant to replace their pipeline-based agents with LLMs entirely due to privacy concerns and the need for deep integration within their existing ecosystems. A hybrid approach in which LLMs' are integrated into the pipeline-based agents allows them to save time and costs of building and running agents by capitalizing on the capabilities of LLMs while retaining the integration and privacy safeguards of their existing systems.
    Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders. (arXiv:2202.09671v4 [stat.ML] UPDATED)
    Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain. However, this approach is slow and costly because it needs many forward and reverse steps. We propose a faster and cheaper approach that adds noise not until the data become pure random noise, but until they reach a hidden noisy data distribution that we can confidently learn. Then, we use fewer reverse steps to generate data by starting from this hidden distribution that is made similar to the noisy data. We reveal that the proposed model can be cast as an adversarial auto-encoder empowered by both the diffusion process and a learnable implicit prior. Experimental results show even with a significantly smaller number of reverse diffusion steps, the proposed truncated diffusion probabilistic models can provide consistent improvements over the non-truncated ones in terms of performance in both unconditional and text-guided image generations.
    Explanation Shift: How Did the Distribution Shift Impact the Model?. (arXiv:2303.08081v2 [cs.LG] UPDATED)
    As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In practice, new input data tend to come without target labels. Then, state-of-the-art techniques model input data distributions or model prediction distributions and try to understand issues regarding the interactions between learned models and shifting distributions. We suggest a novel approach that models how explanation characteristics shift when affected by distribution shifts. We find that the modeling of explanation shifts can be a better indicator for detecting out-of-distribution model behaviour than state-of-the-art techniques. We analyze different types of distribution shifts using synthetic examples and real-world data sets. We provide an algorithmic method that allows us to inspect the interaction between data set features and learned models and compare them to the state-of-the-art. We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.
    Comparing Sequential Forecasters. (arXiv:2110.00115v5 [stat.ME] UPDATED)
    Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures for estimating the time-varying difference in forecast scores. To do this, we employ confidence sequences (CS), which are sequences of confidence intervals that can be continuously monitored and are valid at arbitrary data-dependent stopping times ("anytime-valid"). The widths of our CSs are adaptive to the underlying variance of the score differences. Underlying their construction is a game-theoretic statistical framework, in which we further identify e-processes and p-processes for sequentially testing a weak null hypothesis -- whether one forecaster outperforms another on average (rather than always). Our methods do not make distributional assumptions on the forecasts or outcomes; our main theorems apply to any bounded scores, and we later provide alternative methods for unbounded scores. We empirically validate our approaches by comparing real-world baseball and weather forecasters.
    Improved theoretical guarantee for rank aggregation via spectral method. (arXiv:2309.03808v1 [stat.ML])
    Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. As it is generally NP-hard to find a global ranking that minimizes the mismatch (known as the Kemeny optimization), we focus on the Erd\"os-R\'enyi outliers (ERO) model for this ranking problem. Here, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart. By using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item, with only $\Omega(n\log n)$ samples. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.
    BoXHED2.0: Scalable boosting of dynamic survival analysis. (arXiv:2103.12591v5 [cs.LG] UPDATED)
    Modern applications of survival analysis increasingly involve time-dependent covariates. The Python package BoXHED2.0 is a tree-boosted hazard estimator that is fully nonparametric, and is applicable to survival settings far more general than right-censoring, including recurring events and competing risks. BoXHED2.0 is also scalable to the point of being on the same order of speed as parametric boosted survival models, in part because its core is written in C++ and it also supports the use of GPUs and multicore CPUs. BoXHED2.0 is available from PyPI and also from www.github.com/BoXHED.
    Ensemble linear interpolators: The role of ensembling. (arXiv:2309.03354v1 [stat.ML])
    Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. We introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. The proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. Focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. Our results reveal the statistical roles of sketching and bagging. In particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. However, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. In stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. To further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. We also discuss several extensions.
    Proper Learning of Linear Dynamical Systems as a Non-Commutative Polynomial Optimisation Problem. (arXiv:2002.01444v5 [math.OC] UPDATED)
    There has been much recent progress in forecasting the next observation of a linear dynamical system (LDS), which is known as the improper learning, as well as in the estimation of its system matrices, which is known as the proper learning of LDS. We present an approach to proper learning of LDS, which in spite of the non-convexity of the problem, guarantees global convergence of numerical solutions to a least-squares estimator. We present promising computational results.
    Bridging the Gap Between Target Networks and Functional Regularization. (arXiv:2106.02613v4 [stat.ML] UPDATED)
    Bootstrapping is behind much of the successes of deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages such as being inflexible and can result in instabilities, even when vanilla TD(0) converges. To overcome these issues, we propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space and we theoretically study its convergence. We conduct an experimental study across a range of environments, discount factors, and off-policiness data collections to investigate the effectiveness of the regularization induced by Target Networks and Functional Regularization in terms of performance, accuracy, and stability. Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement. Furthermore, adjusting both the regularization weight and the network update period in Functional Regularization can result in further performance improvements compared to solely adjusting the network update period as typically done with Target Networks. Our approach also enhances the ability to networks to recover accurate $Q$-values.
    A Probabilistic Semi-Supervised Approach with Triplet Markov Chains. (arXiv:2309.03707v1 [stat.ML])
    Triplet Markov chains are general generative models for sequential data which take into account three kinds of random variables: (noisy) observations, their associated discrete labels and latent variables which aim at strengthening the distribution of the observations and their associated labels. However, in practice, we do not have at our disposal all the labels associated to the observations to estimate the parameters of such models. In this paper, we propose a general framework based on a variational Bayesian inference to train parameterized triplet Markov chain models in a semi-supervised context. The generality of our approach enables us to derive semi-supervised algorithms for a variety of generative models for sequential Bayesian classification.
    Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming. (arXiv:2209.08901v3 [math.OC] UPDATED)
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods.
    Empirical Risk Minimization for Losses without Variance. (arXiv:2309.03818v1 [stat.ML])
    This paper considers an empirical risk minimization problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p \in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the optimizer by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012). Thanks to the structure of Catoni-type influence functions, we are able to establish excess risk upper bounds via using generalized generic chaining methods. Moreover, we take computational issues into consideration. We especially theoretically investigate two types of optimization methods, robust gradient descent algorithm and empirical risk-based methods. With an extensive numerical study, we find that the optimizer based on empirical risks via Catoni-style estimation indeed shows better performance than other baselines. It indicates that estimation directly based on truncated data may lead to unsatisfactory results.
    Early warning via transitions in latent stochastic dynamical systems. (arXiv:2309.03842v1 [stat.ML])
    Early warnings for dynamical transitions in complex systems or high-dimensional observation data are essential in many real world applications, such as gene mutation, brain diseases, natural disasters, financial crises, and engineering reliability. To effectively extract early warning signals, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in low-dimensional manifold. Applying the methodology to authentic electroencephalogram (EEG) data, we successfully find the appropriate effective coordinates, and derive early warning signals capable of detecting the tipping point during the state transition. Our method bridges the latent dynamics with the original dataset. The framework is validated to be accurate and effective through numerical experiments, in terms of density and transition probability. It is shown that the second coordinate holds meaningful information for critical transition in various evaluation metrics.
    Auto-SDE: Learning effective reduced dynamics from data-driven stochastic dynamical systems. (arXiv:2205.04151v2 [stat.ML] UPDATED)
    Multiscale stochastic dynamical systems have been widely adopted to scientific and engineering problems due to their capability of depicting complex phenomena in many real world applications. This work is devoted to investigating the effective reduced dynamics for a slow-fast stochastic dynamical system. Given observation data on a short-term period satisfying some unknown slow-fast stochastic system, we propose a novel algorithm including a neural network called Auto-SDE to learn invariant slow manifold. Our approach captures the evolutionary nature of a series of time-dependent autoencoder neural networks with the loss constructed from a discretized stochastic differential equation. Our algorithm is also proved to be accurate, stable and effective through numerical experiments under various evaluation metrics.
    Deep Network Approximation: Beyond ReLU to Diverse Activation Functions. (arXiv:2307.06555v3 [cs.LG] UPDATED)
    This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $4N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, at the cost of slightly larger constants.
    Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples. (arXiv:2309.03847v1 [stat.ML])
    We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to estimate a mixture of $k$ Gaussians up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover [BKSW19] with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover [AAL21].
    A Tutorial on the Non-Asymptotic Theory of System Identification. (arXiv:2309.03873v1 [eess.SY])
    This tutorial serves as an introduction to recently developed non-asymptotic methods in the theory of -- mainly linear -- system identification. We emphasize tools we deem particularly useful for a range of problems in this domain, such as the covering technique, the Hanson-Wright Inequality and the method of self-normalized martingales. We then employ these tools to give streamlined proofs of the performance of various least-squares based estimators for identifying the parameters in autoregressive models. We conclude by sketching out how the ideas presented herein can be extended to certain nonlinear identification problems.
    Gradient-Based Feature Learning under Structured Data. (arXiv:2309.03843v1 [stat.ML])
    Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
    Knowledge Distillation Layer that Lets the Student Decide. (arXiv:2309.02843v1 [cs.CV] CROSS LISTED)
    Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and beyond, its action on student's feature transform is rather implicit, limiting its practice in the intermediate layers. To explicitly embed the teacher's knowledge in feature transform, we propose a learnable KD layer for the student which improves KD with two distinct abilities: i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper. Thus, the student enjoys the teacher's knowledge during the inference besides training. Formally, we repurpose 1x1-BN-ReLU-1x1 convolution block to assign a semantic vector to each local region according to the template (supervised by the teacher) that the corresponding region of the student matches. To facilitate template learning in the intermediate layers, we propose a novel form of supervision based on the teacher's decisions. Through rigorous experimentation, we demonstrate the effectiveness of our approach on 3 popular classification benchmarks. Code is available at: https://github.com/adagorgun/letKD-framework
    Trinary Decision Trees for missing value handling. (arXiv:2309.03561v1 [stat.ML])
    This paper introduces the Trinary decision tree, an algorithm designed to improve the handling of missing data in decision tree regressors and classifiers. Unlike other approaches, the Trinary decision tree does not assume that missing values contain any information about the response. Both theoretical calculations on estimator bias and numerical illustrations using real data sets are presented to compare its performance with established algorithms in different missing data scenarios (Missing Completely at Random (MCAR), and Informative Missingness (IM)). Notably, the Trinary tree outperforms its peers in MCAR settings, especially when data is only missing out-of-sample, while lacking behind in IM settings. A hybrid model, the TrinaryMIA tree, which combines the Trinary tree and the Missing In Attributes (MIA) approach, shows robust performance in all types of missingness. Despite the potential drawback of slower training speed, the Trinary tree offers a promising and more accurate method of handling missing data in decision tree algorithms.
    Copula Representations and Error Surface Projections for the Exclusive Or Problem. (arXiv:1907.04483v2 [cs.LG] UPDATED)
    The exclusive or (xor) function is one of the simplest examples that illustrate why nonlinear feedforward networks are superior to linear regression for machine learning applications. We review the xor representation and approximation problems and discuss their solutions in terms of probabilistic logic and associative copula functions. After briefly reviewing the specification of feedforward networks, we compare the dynamics of learned error surfaces with different activation functions such as RELU and tanh through a set of colorful three-dimensional charts. The copula representations extend xor from Boolean to real values, thereby providing a convenient way to demonstrate the concept of cross-validation on in-sample and out-sample data sets. Our approach is pedagogical and is meant to be a machine learning prolegomenon.
    Causal thinking for decision making on Electronic Health Records: why and how. (arXiv:2308.01605v3 [stat.ME] UPDATED)
    Accurate predictions, as with machine learning, may not suffice to provide optimal healthcare for every patient. Indeed, prediction can be driven by shortcuts in the data, such as racial biases. Causal thinking is needed for data-driven decisions. Here, we give an introduction to the key elements, focusing on routinely-collected data, electronic health records (EHRs) and claims data. Using such data to assess the value of an intervention requires care: temporal dependencies and existing practices easily confound the causal effect. We present a step-by-step framework to help build valid decision making from real-life patient records by emulating a randomized trial before individualizing decisions, eg with machine learning. Our framework highlights the most important pitfalls and considerations in analysing EHRs or claims data to draw causal conclusions. We illustrate the various choices in studying the effect of albumin on sepsis mortality in the Medical Information Mart for Intensive Care database (MIMIC-IV). We study the impact of various choices at every step, from feature extraction to causal-estimator selection. In a tutorial spirit, the code and the data are openly available.
    Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck. (arXiv:2309.03800v1 [cs.LG])
    This work investigates the nuanced algorithm design choices for deep learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. This lower bound can be interpreted as a multi-resource tradeoff frontier: successful learning can only occur if one is sufficiently rich (large model), knowledgeable (large dataset), patient (many training iterations), or lucky (many random guesses). We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. Here, width plays the role of parallel search: it amplifies the probability of finding "lottery ticket" neurons, which learn sparse features more sample-efficiently. Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning. We demonstrate improved sample efficiency on tabular classification benchmarks by using wide, sparsely-initialized MLP models; these networks sometimes outperform tuned random forests.
    Medoid Silhouette clustering with automatic cluster number selection. (arXiv:2309.03751v1 [cs.LG])
    The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, provide two fast versions for the direct optimization, and discuss the use to choose the optimal number of clusters. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm. Additionally, we provide a variant to choose the optimal number of clusters directly.
    Adversarially Robust Deep Learning with Optimal-Transport-Regularized Divergences. (arXiv:2309.03791v1 [cs.LG])
    We introduce the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. These methods are based on a new class of optimal-transport-regularized divergences, constructed via an infimal convolution between an information divergence and an optimal-transport (OT) cost. We use these as tools to enhance adversarial robustness by maximizing the expected loss over a neighborhood of distributions, a technique known as distributionally robust optimization. Viewed as a tool for constructing adversarial samples, our method allows samples to be both transported, according to the OT cost, and re-weighted, according to the information divergence. We demonstrate the effectiveness of our method on malware detection and image recognition applications and find that, to our knowledge, it outperforms existing methods at enhancing the robustness against adversarial attacks. $ARMOR_D$ yields the robustified accuracy of $98.29\%$ against $FGSM$ and $98.18\%$ against $PGD^{40}$ on the MNIST dataset, reducing the error rate by more than $19.7\%$ and $37.2\%$ respectively compared to prior methods. Similarly, in malware detection, a discrete (binary) data domain, $ARMOR_D$ improves the robustified accuracy under $rFGSM^{50}$ attack compared to the previous best-performing adversarial training methods by $37.0\%$ while lowering false negative and false positive rates by $51.1\%$ and $57.53\%$, respectively.
    On the dynamics of multi agent nonlinear filtering and learning. (arXiv:2309.03557v1 [stat.ML])
    Multiagent systems aim to accomplish highly complex learning tasks through decentralised consensus seeking dynamics and their use has garnered a great deal of attention in the signal processing and computational intelligence societies. This article examines the behaviour of multiagent networked systems with nonlinear filtering/learning dynamics. To this end, a general formulation for the actions of an agent in multiagent networked systems is presented and conditions for achieving a cohesive learning behaviour is given. Importantly, application of the so derived framework in distributed and federated learning scenarios are presented.  ( 2 min )

  • Open

    Differentially private median and more
    Posted by Edith Cohen and Uri Stemmer, Research Scientists, Google Research Differential privacy (DP) is a rigorous mathematical definition of privacy. DP algorithms are randomized to protect user data by ensuring that the probability of any particular output is nearly unchanged when a data point is added or removed. Therefore, the output of a DP algorithm does not disclose the presence of any one data point. There has been significant progress in both foundational research and adoption of differential privacy with contributions such as the Privacy Sandbox and Google Open Source Library. ML and data analytics algorithms can often be described as performing multiple basic computation steps on the same dataset. When each such step is differentially private, so is the output, but with …  ( 94 min )
  • Open

    Instacart boosts AI capacity, readies for IPO with OpenAI's ChatGPT-powered eCommerce search
    On the verge of its IPO, Instacart has introduced major AI-powered features to its Storefront platform and the smart Caper Carts. Main upgrades: conversational search powered by OpenAI's ChatGPT and inbuilt AI models. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/olqtxvwjo3nb1.png?width=750&format=png&auto=webp&s=d8eaefbb9865c51732efc2792ec386610ecd38e6 AI advancements in Instacart's infrastructure Instacart, which holds approximately 22% of the $132 billion US online grocery-delivery market, has been leaning more towards being a tech platform. The new Instacart Storefront, entailing features driven by 150 proprietary AI models, is built on the same core infrastructure as the Instacart app. Customers can engage in open-ended searches on retailers' storefronts via the search bar. AI upgrades in Caper Carts AI-powered Caper Carts by Instacart have been upgraded. Customers can now order directly from their Caper Cart and get informed when their orders are ready. Camera and weight sensor efficiency is enhanced thanks to improved AI models, ensuring a smoother shopping journey and providing an extra layer of security against suspicious activity. (source) P.S. If you want this kind of analysis, delve into the latest updates in AI with our free newsletter, already favored by professionals from Google, Meta, and OpenAI. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    AI girlfriend ads are flooding Instagram and TikTok
    Tech startups are running sexually explicit ads for apps promoting not-safe-for-work experiences on platforms like Facebook, Instagram, and TikTok. These ads feature digitally created potential 'girlfriends' with large breasts and tight clothing, and some even use popular children's TV characters to promote 'NSFW pics' apps. NBC News found 35 app developers running sexually explicit ads on Meta-owned apps, and 14 app developers running similar ads on TikTok. The marketing push is part of an AI gold rush, capitalizing on the surge of interest in AI and benefiting from a double standard that hurts real human sex workers. Researchers believe that the gender-based slant in these ads reflects social media platforms allowing sex-related ads only if the intended audience is men. Meta and TikTok have stepped up their removal of sexually explicit AI ads after NBC News contacted them, but questions remain about how the ads got through their filters in the first place. Similar ads also appear in the Apple and Google app stores, although the extent of advertising there is unknown. Source : https://www.nbcnews.com/tech/social-media/ai-girlfriend-ads-instagram-tiktok-chat-pics-chatgpt-dose-rcna97547 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    AI — weekly megathread!
    News provided by aibrews.com Technology Innovation Institute in Abu Dhabi has released Falcon 180B - a large language model with 180 billion parameters, trained on 3.5 trillion tokens. It's currently the largest openly available model, and rivals proprietary models like PaLM-2. Falcon 180B is 2.5 times larger than Llama 2 and was trained with 4x more compute. It is available for both research and commercial use [Details]. Meta AI released Belebele, a first-of-its-kind multilingual reading comprehension dataset spanning 122 language variants, enabling direct comparison of how well models understand different languages [Details]. Meta AI has published Code Llama’s research paper with more information on training, evaluation results and safety [Paper]. Open Interpreter, an open-source, …  ( 10 min )
    Animating a 2D image in real time
    Hello Everyone, i have recently started working on a project, where I need to animate an image of a face in real time to speak sentences. Essentially I am trying to build a face for my own large language model. I know of Nvidia's Audio2Face and Metahuman, but these are all in 3D and take a lot of time rendering the lip and eye animations. I need something, which works only with a bit of latency. ​ Does anyone know a service or a repo I could use to animate a 2D picture to speak text? submitted by /u/Fabianslife [link] [comments]  ( 9 min )
    Free AI transforms text and images into amazing videos - Pika Labs
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    Would ChatGPT work to help with looking for WFH jobs?/changing careers?
    This is a complete ChatGPT beginner question but has anyone ever downloaded it and used it to help with looking for specific job roles? Mainly WFH related? Or thought about changing careers and used ChatGPT to help with that? I know there are a lot of other ways to go about this but would ChatGPT help with this at all? submitted by /u/jackbowls [link] [comments]  ( 9 min )
    Do you feel endangered by the rise of AI?
    View Poll submitted by /u/MiladMansory [link] [comments]  ( 9 min )
    AI grading and AI screening but no AI for homework/assignments/exam?
    Professors send emails explaining that they use AI but they reviewed the grades from AI to make sure everything is fine. But students can’t use AI and then review the results just make sure everything is fine. submitted by /u/PrettyHappyAndGay [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/7/2023
    A new AI tool developed by startup Delphi allows users to create virtual clones of themselves or anyone else. Users can upload an ID and add various files, such as emails, chat transcripts, and videos, to generate an AI chatbot that mimics their personality.[1] OpenAI will host its first developer conference on November 6.[2] Meta Platforms Inc. today released FACET, a benchmark dataset designed to help researchers audit computer vision models for bias.[3] Australia to require AI-made child abuse material be removed from search results.[4] Sources: [1] https://technotrenz.com/news/a-new-ai-service-allows-for-the-creation-of-a-virtual-version-of-yourself-or-a-loved-one-that-is-capable-of-making-phone-calls-on-your-behalf-2772634.html [2] https://techcrunch.com/2023/09/06/openai-will-host-its-first-developer-conference-on-november-6/ [3] https://siliconangle.com/2023/08/31/meta-releases-facet-dataset-evaluating-ai-fairness/ [4] https://www.reuters.com/technology/australia-require-ai-made-child-abuse-material-be-removed-search-results-2023-09-08/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    AGI will be not feasible any time soon, here's why
    I was thinking today about all the AI hype we have right with somewhat a bunch of new breakthroughs each month, but things not only are getting slower updates, but the updates impacts itself are becoming lesser. If that is not enough, well we have big problems ahead, such as processors are reaching the physical limit, quantum effects disrupting the works, wafers becoming increasing more expensive, the size reduction is no longer adding the same boosts in power and new materials are just far from viable. On top of this we are going meet two other walls, the software and the energy. About the first, as we make better and more complex algorithms for computation the harder it gets to make better ones to squeeze more power and handle more complex tasks. The second, is becoming more real as bi…  ( 12 min )
  • Open

    [P] MLOps for Vercel OpenAI chatbot infrastructure
    I used infrastructure as code (IaC) to provision and deploy Vercel's next-openai example. IaC is useful because it applies the same rigor of application code development to infrastructure provisioning. Instead of manual point and click in a cloud console which can be unrepeatable or error-prone, you just store and change all infrastructure configurations as code in source control . This example uses Pulumi which allows you to write the IaC in Python. https://github.com/aaronkao/vercel-py-openai-chatbot submitted by /u/kao-pulumi [link] [comments]  ( 9 min )
    Why Do You Not Use Open Source LLMs? (Or do you?) [D] (Repost because I made a mistake in the title)
    Reposting because I intended to ask about LLMs, not AI in general, and forgot that I don't need to dumb down the terminology for this sub. Thanks to the people who pointed out that mistake. --- original post --- This is something I'm curious about. I've seen a few people declaring that they're not using open source LLMs because they're GPU-poor, because the models aren't good enough, because the uis/frontends are hard to get started with, etc., and I've been wondering how much these comments and posts reflect the opinions and needs of the community as a whole. So, here's a poll. Answer away if you feel like it. I'm sharing this on a few other subs too (for the sake of greater information gathering) so please don't vote more than once. If your reasoning is not on here, feel free to comment your thoughts. If more than one option describes you, please select the one that describes you the most. View Poll submitted by /u/Heralax_Tekran [link] [comments]  ( 9 min )
    [D] What are good resources for creating NLP algorithms from scratch?
    I'm looking to learn more about concurrency/parallelism, optimization, data structures and algorithms from an NLP perspective. submitted by /u/Al_Miksiki [link] [comments]  ( 9 min )
    [D] Please Help - Machine Learning (ML) Engineers
    Hello Everyone, I'm currently exploring the idea of a solution tailored for ML engineers and technologists. While I have a background in recruiting, I've often found myself dissatisfied with the typical recruitment process. It seems that many recruiters don't always appreciate the importance of working with candidates or understand the impact on people's livelihoods and careers. What I'm proposing is the creation of a career representation firm specifically designed for purpose-driven technologists specializing in data, product, and hardware careers. This firm would advocate for the career interests of the most passionate ML engineers. Our representation would encompass: - Strategic Career Development: Crafting a strategic approach to help engineers secure opportunities aligned with their desired projects and professional development. - Impact Matching: Identifying and connecting engineers with projects and teams where their technical skills, career goals, and personal interests can have the greatest positive impact, ensuring that your work aligns with your values and aspirations. - Industry Leadership: Positioning you as an industry leader by marketing your expertise and securing speaking engagements at conferences and other events, enhancing your professional visibility and reputation. In return for this representation, engineers would commit to a 3% fee deducted from their salary, which would support the services provided by the firm. Would you be interested in participating in such a service? If not, would you consider recommending it to someone you know? If you are in favor of this idea, what makes you believe it would be advantageous for others even if it might not be your preference? Do you think you could personally benefit from this type of career representation? Thanks! submitted by /u/Educational_Bar_6352 [link] [comments]  ( 10 min )
    [R] Algorithm of Thoughts Prompt Engineering Breakdown
    Paper: https://arxiv.org/abs/2308.10379 Saw someone else post about this new prompting method on the sub here so I decided to put together a run down and prompt template. Pretty interesting to see the different methods emerge and how some attempt to simulate how code runs. My rundown -> https://www.prompthub.us/blog/how-algorithm-of-thoughts-prompting-works submitted by /u/dancleary544 [link] [comments]  ( 9 min )
    [R][D] How to implement Sinusoidal Positional Embedding?
    Hi fellow computer scientists, so I've been researching a little about transformers and meanwhile I had to understand sinusoidal positional embedding. I have found two implementations for this, after testing both approaches I found they compute different embeddings for the same position/timestep with the same embedding dimensions... shouldn't it be equal if the position and embedding dimensions are the same? This is getting me confused, because now I don't know which implementation should I consider... Do you have any suggestions to where I can look? Thank you :) submitted by /u/Christs_Elite [link] [comments]  ( 9 min )
    "[Discussion]"
    Hi guys, I'm completely new in this field.. I have a research in civil engineering and need to learn python, machine learning and data analysis as short as possible. Where can I achieve that?? please help me by naming the best courses or any free materials available🙏 submitted by /u/Ok-Upstairs7749 [link] [comments]  ( 9 min )
    [P] Question answering based on book-summaries
    I'm one of those people who always ask questions about movies because there's something they don't get or have forgotten. Especially with more complex stories, like Game of Thrones. At the moment I'm reading Wheel of Time, a rather long fantasy series. I had the idea to build the following WebApp: There is online each chapter of the series summarized separately. So in the WebApp I could ask questions about the content. In addition, I can indicate which chapter I am reading, so that it is ensured not to spoil the user. I want to avoid to train a model. I would prefer to use one of the existing open-source models, like llama. A first, primitive idea: give the LLM all the summaries and the user's question. But this would mean to give all summaries as input every time. Not only that this approach would not be elegant, the restriction in the input size (number of words) would make this possibly even impossible. Feel free to share your ideas how i could solve this :) submitted by /u/Individual-Cause-616 [link] [comments]  ( 9 min )
    Help me with creating dataset from .mat files [D]
    I have so many .mat files in a folder which have two arrays inside each .mat file. that is, for each .mat file, i have a (224*224) array and another (136,1) array. These 224*224 arrays are my X_trains for a model and these corresponding 136*1 arrays are my y_trains (labels). i can read these files as np arrays using scipy's loadmat. My problem is, is there a way to usen tf.data .Dataset object to send these to a model or there is any other way? Also using this tf.data.Dataset can i split into train, test, val data? submitted by /u/likhith-69 [link] [comments]  ( 9 min )
    [P] CLI tool to benchmark 100+LLMs response, response time, cost
    Hi r/MachineLearning, I built a CLI tool to benchmark 100+ LLMs for a given question. Benchmark output allows you to compare responses, response time and cost. Try it here: https://github.com/BerriAI/litellm/blob/main/cookbook/benchmark/readme.md CLI Output: Output from CLI Tool Simply select your LLMs, enter your API keys, LLM configs and run python3 benchmark.py Happy completion()! submitted by /u/Comfortable_Dirt5590 [link] [comments]  ( 9 min )
    Text summarization [P]
    Hey! If anyone has worked with text summarization before especially with TF-IDF and extractive summarization,kindly please dm me. Hope you have a great day! submitted by /u/Ok-Avocado-5370 [link] [comments]  ( 9 min )
    [P] A look at Apple’s new Transformer-powered predictive text model
    In the upcoming versions of macOS and iOS, Apple is including a predictive text model which offers suggestions while you type, which they’ve said to be a "transformer model". I managed to find some details about this model, including details about its topology and tokenizer, and I was even able to peek in and see several of its top predictions while typing! Blogpost: https://jackcook.com/2023/09/08/predictive-text.html Source code: https://github.com/jackcook/predictive-spy Hopefully this can give some insight into some of the trade-offs that Apple went through to put a model on every iPhone and MacBook — it’s small, it has a pretty narrow scope, and it’s not very capable on its own. Let me know what you think! submitted by /u/jackcook [link] [comments]  ( 9 min )
    [P] AI Beats Hockolicious, Trackmania's Most Prestigious Map
    Follow-up on our previous post (Vision-based reinforcement learning for Trackmania: close or at superhuman level). Many comments rightfully pointed that the map we trained on: - lacked difficult features like jumps, airbrakes, drifts, ... - had not widely been played by humans We have now trained the same AI on the game's most prestigious map: Hockolicious. We also prepared a video describing the approach with much more detail. Here is our result :) AI Beats Hockolicious, Trackmania's Most Prestigious Map Note: We are still using a convolutional neural network with a structure similar to Nature's DQN paper. I am curious whether other architectures (the ResNet-like in the IMPALA paper ?) could help. Do you have any suggestions on how the neural network's vision head should be structured for that specific task? submitted by /u/Linesight_rl [link] [comments]  ( 9 min )
    [D] Methodology for counting/segmenting objects in close formations
    Hello all. I'm new to object recognition and instance segmentation. I am trying to work on a project in which I use drone imagery to detect objects that are in close formations with each other. I do this for the purpose of counting particular objects, as well as to check if an object has moved (by making a prediction on drone imagery that is taken later). Create masks? I'm now trying to understand what methodology/models make sense. First of all, should I be looking at creating masks, or do bounding boxes suffice? My idea was that masks are better, since bounding boxes overlap with each other and can miss that an object has moved slightly, Or am I wrong and are masks just an extra hassle? Or shouldn't I be looking at bounding boxes or masks at all? MaskRCNN? Model-wise, should I be lo…  ( 10 min )
    [D] Chains and Agents
    I think there's a lot of confusion around AI agents today and it's mainly because of lack of definition and using the wrong terminology. We've been talking to many companies who are claiming they're working on agents but when you look under the hood, they are really just chains. I just listened to the Latent Space pod with Harrison Chase (Founder of Langchain) and I really liked how he thinks about chains vs agents. Chains: sequence of tasks in a more rigid order, where you have more control, more predictability. Agents: handling the edge-cases, the long-tail of things that can happen. And the most important thing is that it's not an OR question but an AND one: you can use them in the same application by starting with chains -> figuring our the edge-cases -> using agents to deal with them. https://preview.redd.it/l59sc4sri0nb1.png?width=3127&format=png&auto=webp&s=1f3f8730c48687eaabf1f554deb181cf35b96036 submitted by /u/BootstrapGuy [link] [comments]  ( 9 min )
    [D] Question for Jensen Huang
    I have the opportunity to see Jensen speak in the next month at a semi private event, 250-300 people. I will probably have the opportunity to ask him a question. What would you ask him? submitted by /u/Zealousideal-Food285 [link] [comments]  ( 9 min )
    [D] Object detection in 3D
    Greetings, people. My colleague told me about some methods of object detection/classification on 3D models, and now I'm exploring them. But during my research I couldn't find that much information about them. I would like to ask you to provide me information, literature and examples of application for them. I remember that one of the techniques is called voxelization. But still not able to find great and intuitive example. Would be thankful for any information :) submitted by /u/thattallsoldier [link] [comments]  ( 9 min )
    [D] What object detection and segmentation model repos do you folks use for production
    Looking at all the popular yolo repos, v5, v7,v8, yolo-nas, all of them seem to have restrictive licenses (gpl3, agpl, apache 2) where the trained model files also falls under the license. What do people usually use to deploy detection/segmentation in production, especially with resource constraints (can't use something like fast-rcnn) submitted by /u/Appropriate_Bear_894 [link] [comments]  ( 9 min )
    [D] Proper use of ai-voice-cloning / rvc / tortoise
    Hey guys! I need some help here.. many days trying to get good results but without success. So I already have the voice I want to use (edited with uvr5 and it sounds really great, without any echo or noise on the background), I trained it on aivc so that I can generate this voice verbalizing the text content I need. I used high quality - it took like 40min to generate each phrase - and it is ok, but still a little robotic. So I installed RVC and trained a model with the original voice (edited with the uvr5) just like I did the training on aivc. So I loaded the trained model on the inference tab and I selected the audio to be processed - the generated audio files from aivc. Even selecting the harvest mode, the output was worse than the generated files from aivc. I even tried to record my own voice speaking the text but it does not sound good. My trained model on rvc has 500 epochs, and it may be a very good model to use, yet idk what I’m doing wrong. Maybe I’m misusing rvc, so what I need is to improve the realism of my aivc(or tortoise) generated voices, simple as that, is rvc the best option to do this? If yes, how? Any help please would be much appreciated thanks! submitted by /u/JustSayin_thatuknow [link] [comments]  ( 9 min )
    [R] FLM-101B: An Open LLM and How to Train It with $100K Budget
    submitted by /u/hzj5790 [link] [comments]  ( 9 min )
    [R] Seeking Research Papers on Weight Manipulation in Physics-Informed Neural Networks (PINNs)
    Would you kindly share noteworthy papers that have caught your interest concerning the weights of physics-informed neural networks (PINNs)? I am looking for any innovative paper that has something to do with weights of the physics-informed neural networks or deep neural networks in general and its manipulation. Specifically, I am seeking innovative papers on weight manipulation in physics-informed neural networks. For instance papers like: Weight initialization algorithm for physics-informed neural networks using finite differences Transfer Learning with Physics-Informed Neural Networks for Efficient Simulation of Branched Flows Note that I am referring to the actual weights of the neural network and not the weights of the loss terms. I have to add that ideas from transfer learning are welcome too. submitted by /u/ai_physics2023 [link] [comments]  ( 9 min )
    [P][R] Finetune LLMs via the Finetuning Hub
    Hi ML community, I have been working on benchmarking publicly available LLMs these past couple of weeks. More precisely, I am interested on the finetuning piece since a lot of businesses are starting to entertain the idea of self-hosting LLMs trained on their proprietary data rather than relying on third party APIs. To this point, I am tracking the following 4 pillars of evaluation that businesses are typically look into: - Performance - Time to train an LLM - Cost to train an LLM - Inference (throughput / latency / cost per token) For each LLM, my aim is to benchmark them for popular tasks, i.e., classification and summarization. Moreover, I would like to compare them against each other. So far, I have benchmarked Flan-T5-Large, Falcon-7B and RedPajama and have found them to be very efficient in low-data situations, i.e., when there are very few annotated samples. Llama2-7B/13B and Writer’s Palmyra are in the pipeline. But there’s so many LLMs out there! In case this work interests you, would be great to join forces. GitHub repo attached — feedback is always welcome :) https://github.com/georgian-io/LLM-Finetuning-Hub Happy hacking! submitted by /u/l-llm [link] [comments]  ( 9 min )
  • Open

    AI pilot programs look to reduce energy use and emissions on MIT campus
    A cross-departmental team is leading efforts to utilize machine learning for increased efficiency in heating and cooling MIT’s buildings.  ( 10 min )
    Jackson Jewett wants to design buildings that use less concrete
    The PhD student is honing algorithms for designing large structures with less material — helping to shrink the construction industry’s huge carbon footprint.  ( 10 min )
  • Open

    Can't solve Gymnasium Frozenlake-v1 8x8 with A2C
    Hello, I'm trying to solve the Frozenlake-v1 environment with is_slippery = True (non-deterministic) with the stable baselines 3 A2C algorithm. I can solve the 4x4 version but I can't achieve any results with the 8x8 version. I also checked the RL-Zoo to see if there is any hyperparameter tunning about that environment but there is nothing. Which adjustments can I do to make it work properly? submitted by /u/MetallicaSPA [link] [comments]  ( 9 min )
    RL in games
    Hello guys, I was suddenly inspired to make a WH Gladius bot. Background: I recently got into the game, it seems very interesting to me personally, but alas, there are not enough guides on it for you to learn how to play at a high level. I don’t intend to spend hundreds of hours to master the base, so I decided to try something like RARL so that this thing would learn, and I could analyze its moves, change the conditions and thus start playing at an intermediate level faster. However, a superficial analysis revealed that the game does not have an API at all. Let's say I could grab some stats using Cheat Engine and OllyDbg, but I have no idea how to fit it into the gym. Or does gym as env need to pass a link to the client from the machine so that it only restarts it? In general, if anyone has done something similar, I ask for a link to a guide or a similar example. All the best submitted by /u/kapedalex [link] [comments]  ( 9 min )
    AI Beats Hockolicious, Trackmania's Most Prestigious Map
    Follow-up on our previous post (Vision-based reinforcement learning for Trackmania: close or at superhuman level). Many comments rightfully pointed that the map we trained on: - lacked difficult features like jumps, airbrakes, drifts, ... - had not widely been played by humans We have now trained the same AI on the game's most prestigious map: Hockolicious. We also prepared a video describing the approach with much more detail. Here is our result :) AI Beats Hockolicious, Trackmania's Most Prestigious Map Note: We are still using a convolutional neural network with a structure similar to Nature's DQN paper. I am curious whether other architectures (the ResNet-like in the IMPALA paper ?) could help. Do you have any suggestions on how the neural network's vision head should be structured for that specific task? submitted by /u/Linesight_rl [link] [comments]  ( 9 min )
    Difference between experience replay and multi time-step inputs.
    In DQN, if I want to train a model which takes into account of the current state and previous k states, do I use consecutive experience replay to achieve this or should I implement a DNN with multi time-step inputs? Is the latter allowed, considering the Markov assumption from MDP update? I only have a superficial understanding on the purpose of experience replay, which is used to stabalise the training process and break correlations from consecutive training samples. submitted by /u/cj_1993 [link] [comments]  ( 9 min )
  • Open

    NVIDIA Partners With India Giants to Advance AI in World’s Most Populous Nation
    The world’s largest democracy is poised to transform itself and the world, embracing AI on an enormous scale. Speaking with the press Friday in Bengaluru, in the context of announcements from two of India’s largest conglomerates, Reliance Industries Limited and Tata Group, NVIDIA founder and CEO Jensen Huang detailed plans to bring AI technology and Read article >  ( 6 min )
  • Open

    Implement smart document search index with Amazon Textract and Amazon OpenSearch
    In this post, we’ll take you on a journey to rapidly build and deploy a document search indexing solution that helps your organization to better harness and extract insights from documents. Whether you're in Human Resources looking for specific clauses in employee contracts, or a financial analyst sifting through a mountain of invoices to extract payment data, this solution is tailored to empower you to access the information you need with unprecedented speed and accuracy.  ( 11 min )
    Semantic image search for articles using Amazon Rekognition, Amazon SageMaker foundation models, and Amazon OpenSearch Service
    Digital publishers are continuously looking for ways to streamline and automate their media workflows in order to generate and publish new content as rapidly as they can. Publishers can have repositories containing millions of images and in order to save money, they need to be able to reuse these images across articles. Finding the image that best matches an article in repositories of this scale can be a time-consuming, repetitive, manual task that can be automated. It also relies on the images in the repository being tagged correctly, which can also be automated (for a customer success story, refer to Aller Media Finds Success with KeyCore and AWS). In this post, we demonstrate how to use Amazon Rekognition, Amazon SageMaker JumpStart, and Amazon OpenSearch Service to solve this business problem.  ( 10 min )
    Improving asset health and grid resilience using machine learning
    Machine learning (ML) is transforming every industry, process, and business, but the path to success is not always straightforward. In this blog post, we demonstrate how Duke Energy, a Fortune 150 company headquartered in Charlotte, NC., collaborated with the AWS Machine Learning Solutions Lab (MLSL) to use computer vision to automate the inspection of wooden utility poles and help prevent power outages, property damage and even injuries.  ( 13 min )
  • Open

    Resources to learn relevant linear algebra
    Hello, I have just started a course on neural networks at college and I have found myself lost on the linear algebra. I have no experience using or learning linear algebra so I am extremely confused about eigenvalue decomposition, single value decomposition, and just matrix stuff in general. I was wondering if you all had any resources to share that would help me to learn the relevant linear algebra for creating neural networks. Thank you! submitted by /u/smelliothax [link] [comments]  ( 9 min )
    Help me with creating dataset from .mat files, please
    I have so many .mat files in a folder which have two arrays inside each .mat file. that is, for each .mat file, i have a (224*224) array and another (136,1) array. These 224*224 arrays are my X_trains for a model and these corresponding 136*1 arrays are my y_trains (labels). i can read these files as np arrays using scipy's loadmat. My problem is, is there a way to usen tf.data .Dataset object to send these to a model or there is any other way? Also using this tf.data.Dataset can i split into train, test, val data? submitted by /u/likhith-69 [link] [comments]  ( 9 min )
    Noob here - question about learning an image transformation function
    Suppose that we have a function f(I) that transforms the an RGB image I of size WxH in another RGB image O of size WxH (one example of f could be RGB to gray scale conversion, where O is such that for every pixel i, Ri=Gi=Bi). Suppose that the function f requires seconds of computations on an average PC. My goal is to understand if a neural network can learn f and be faster than f itself, given the fact that a training dataset of pairs (Ii, Oi) (in the thousands or even in the millions) is easy to create. What type of neural network is better suited for this job? submitted by /u/lukeboh [link] [comments]  ( 9 min )
    Design2Prompt
    Guys, I'm looking for an AI that will describe my figma design in detail for another model to write the code in flutter. Is there anything like that out there? submitted by /u/Aru-sejin37 [link] [comments]  ( 9 min )
  • Open

    Understanding social biases through the text-to-image generation lens
    Gender, race, and age disparities in AI-generated images persist. This AIES 2023 study on text-to-image models shows that even basic prompts can lead to underrepresentation, calling for responsible bias mitigation strategies. The post Understanding social biases through the text-to-image generation lens appeared first on Microsoft Research.  ( 10 min )
    Intern Insights: Dr. Josh Benaloh with Anunay Kulshrestha and Karan Newatia
    Every year, interns help advance research at Microsoft. In “Intern Insights,” PhD students Anunay Kulshrestha and Karan Newatia talk with cryptographer Josh Benaloh about working on the verifiable election technology ElectionGuard. The post Intern Insights: Dr. Josh Benaloh with Anunay Kulshrestha and Karan Newatia appeared first on Microsoft Research.  ( 30 min )
  • Open

    Justifiable sample size
    One of the most common things a statistician is asked to do is compute a sample. There are well known formulas for this, so why isn’t calculating a sample size trivial? As with most things in statistics, plugging numbers into a formula is not the hard part. The hard part is deciding what numbers to […] Justifiable sample size first appeared on John D. Cook.  ( 6 min )

  • Open

    Anthropic: From startup to AI powerhouse with Claude Pro launch
    Anthropic, a startup composed of former OpenAI staff, has announced the release of its premium subscription plan, Claude Pro, for Claude 2, its AI-driven chatbot. The affordable subscription offers a plethora of features for users. To stay on top of the latest advancements in AI, look here first. Anthropic's Claude Pro: Cost and Features Priced at $20 per month in the U.S. or £18 in the U.K., users will have access to "5x more usage" compared to the free tier of Claude 2. Subscribers can send unlimited messages, gain priority during high-traffic periods, and get early access to new enhancements. The new package is priced similarly to OpenAI’s paid plan for ChatGPT Plus, a direct rival to Claude 2. Rationale and User Value Since its launch in July, users have praised Claude for…  ( 10 min )
    Thought Experiment: “The Reverse Deep Learning Paradigm”
    submitted by /u/nicdunz [link] [comments]  ( 9 min )
    be my ai vs bing vs bard
    submitted by /u/nicdunz [link] [comments]  ( 8 min )
    Who is missing from the TIME 100 most influential people in AI?
    Who do you think is not on this list but should be? https://time.com/collection/time100-ai/ ​ submitted by /u/smo279 [link] [comments]  ( 9 min )
    What technological improvements led to the current AI boom?
    I have studied artificial intelligence about 15 years ago, and have left the field since. I am curious to learn what has been happening in the field after I've left. I know there's a lot of hype around generative AI like ChatGPT and Wall-E. I find it quite hard though to find out what's exactly the underlying technology breakthroughs that have allowed for these new applications. I mean, neural networks and similar machine learning techniques are already decades old. What technology led to the current AI boom? What would you say are the biggest conceptual improvements since? Or is it all just faster and bigger computers running 2000's tech? submitted by /u/math1985 [link] [comments]  ( 9 min )
    Falcon 180B—A Record-Breaking Open Source LLM on Hugging Face
    The AI community is buzzing with the arrival of Falcon 180B, an open-source LLM with an unprecedented 180 billion parameters. Developed by TII, This powerful model has surpassed key players like Meta's LLaMA 2 and matches commercial models like Google's PaLM-2. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/trscqxmncvmb1.jpg?width=480&format=pjpg&auto=webp&s=0590f4017937e70533414f93c72d9aa6edd62048 Falcon 180B's Unrivaled Performance This advanced LLM is trained on an astounding 3.5 trillion tokens. Falcon 180B's parameters are 2.5 times larger than LLaMA 2's. It outperforms LLaMA 2 in scale and benchmark performance across diverse NLP tasks. On evaluations like the HellaSwag benchmark, it rivals commercial models like Google's PaLM-2. Promising Future Techniques like weight randomization and Nvidia’s Perfusion have helped train Falcon 180B more efficiently. Now freely available on Hugging Face, Falcon 180B is set to benefit from further enhancements by the community. The model's demonstration of advanced natural language abilities makes it a thrilling development in open-source AI. (source) (demo) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    How are AI services today when it comes to making content that requires distribution?
    I'm looking at stuff that could be submitted to a Netflix or Crunchyroll. I'm looking at some of the ai generated content out there, in particular some of the Instagram tutorials and they look really good but none of these are serials like comics, graphic novels, OAVs or even webcomics. submitted by /u/KrusMatrieya [link] [comments]  ( 9 min )
    Intuit cut hundreds of jobs and spent at least $20 billion in a massive bet on AI. Today the company is revealing its new virtual assistant
    submitted by /u/AminoOxi [link] [comments]  ( 9 min )
    Google takes on AI in political ads
    Google is updating its policy to require advertisers to disclose when their election ads include digitally altered or generated content. The update will go into effect in November, ahead of the 2024 presidential election. The goal is to provide transparency and help voters make informed decisions. Minor alterations that are inconsequential to the claims are exempt from the disclosure requirements. Source : https://thehill.com/newsletters/technology/4190769-googles-campaign-ai-crackdown/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Prepare for the Mine-Fest: Radical changes undermine all previous ownership assumptions and now everyone is shouting "Mine".
    Ownership is just a story that we tell each other, a social construct. If people don’t agree on these stories, the concept loses its inherent power. This is true of owning land, money, cars, houses, art, mines, oil-wells, factories, corporations, relationships, loyalties, copyrights, brands, patents or anything else that is owned by you, me or those ever-superior “others”. In a society where change occurs gradually, we become accustomed to the narratives that bind us together and determine who possesses significant wealth, resources, attention, power, fame, and other ego-gratifying treasures, and who has access to only meager portions of these. However, when societies change and new types of goods appear, there might be no agreement about who gets to own these. For example, while the con…  ( 10 min )
    One-Minute Daily AI News 9/6/2023
    The Consensus Search plugin allows users to find answers, search for papers, and draft pieces of content grounded in scientific research by searching our database of 200M+ papers directly within the ChatGPT interface.[1] Israel: AI Software Detects Bleeding Inside Brain During CT Scan; Helps Save Patient’s Life.[2] Chinese tech giant Tencent is launching its artificial intelligence model “Hunyuan” for business use at an annual summit on Thursday.[3] Google on Wednesday said it will mandate that political advertisements on its platforms disclose when images and audio have been altered or created using tools such as AI.[4] Sources: [1] https://consensus.app/home/blog/introducing-the-consensus-search-chatgpt-plugin/ [2] https://english.jagran.com/technology/israel-ai-program-detects-bleeding-inside-brain-during-ct-scan-helps-save-patient-life-full-story-10098464 [3] https://www.cnbc.com/2023/09/07/tencent-releases-ai-model-hunyuan-for-businesses-amid-china-competition.html [4] https://sg.news.yahoo.com/google-require-political-ads-disclose-010502103.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Generative AI poised to replace 2.4 million US jobs by 2030
    Forrester predicts that generative AI will replace 2.4 million US jobs by 2030, mostly white-collar roles, such as technical writers, proofreaders, copywriters, and administrative positions. But ironically, other forms of automation will displace more jobs. To stay on top of the latest advancements in AI, look here first. (Chart showing how much different types of jobs can expect to be influenced by technology) Concerns about Generative AI While the Generative AI impact is significant, other forms of automation are set to cause more widespread job displacement. The most impacted group will be middle-class, college-educated, white-collar workers, specifically those earning above $60,000 annually. Creative professionals stand to benefit Interestingly, workers in creative industries will likely utilize generative AI tools in their jobs rather than being replaced. This includes editors, writers, authors, poets, and lyricists. However, the use of such tools as ChatGPT may result in inconsistent outputs and even "coherent nonsense", leading to potential performance issues. (source) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    [D]What do people think about papers published in the NeurIPS dataset track in comparison to those published in the main conference?
    I'm curious to learn about the perception of papers published in the NeurIPS dataset track in comparison to those published in the main conference. Specifically, I'd like to know how both companies and Ph.D. committees view these papers. Are they considered equally valuable, or is there a notable difference in their reputation and significance? Your insights and experiences would be greatly appreciated! submitted by /u/Longjumping-Yam6941 [link] [comments]  ( 9 min )
    [D] Training a language model for custom scripting language?
    Firstly some house keeping: I'm a bit of a noob at this whole AI / Machine Learning stuff - still trying to learn. This isn't a "do my homework for me" kind of post I know language processing can be taxing, I have up to 4 Tesla V100S 32 GB at my disposal Now that's out the way, here's the story: A team of us have created our own scripting language that is XML based that can do various actions against a database (or the file system) - a script is known as a "job" here is an example of a simple one Set variables by various methods and send their contents and an attachment by email: <SetVariable name="MyDateVar" value="1998-12-25…  ( 10 min )
    [D] Is inference optimization a thing?
    Let me give you a quick intro. My engineering experience primarily revolved around data processing, analytics, and distributed systems. Nonetheless, I had a desire to learn about ML, and imho the best way to learn is to work on a practical project. So, that's precisely what I did. A few months ago, I embarked on an exciting journey with a friend, and together, we've created http://github.com/huggingbench/huggingbench. Now, after three months, I find myself seeking validation for some of my assumptions from the broader community. If you'd like to learn more about our motivations and the path we've taken check out the blog post https://medium.com/@niksa.jakovljevic/introducing-huggingbench-a-path-to-optimized-model-serving-a17cecc8d3ec. What I'd like to gather from individuals with machine learning models in production is their level of investment in optimizing inference. Is this a commonplace practice? I acknowledge that it can vary on a case-by-case basis, but I'm still hopeful of identifying prevailing trends. After conversing with a few companies, I've come to the impression that only the truly large players (those spending six figures or more on inference per month) place significant emphasis on inference optimization, which is entirely understandable. Nevertheless, I sense that there are numerous low hanging fruits that could result in substantial cost savings, even for typical startups. Could it be that the entire machine learning field is still in its infancy, and many engineers may not be fully considering or prioritizing such optimizations? Perhaps businesses are not giving as much attention to cost considerations? Alternatively, there might be technical challenges I'm not yet aware of. In any case, I would greatly appreciate hearing your insights on the subject of inference optimization. submitted by /u/unsigned_mind [link] [comments]  ( 10 min )
    PLEASE HELP (LSTM FOR RAINFALL PREDICTION) [P] [D]
    I have been trying to build a DNN model for predicting the amount of rainfall but it has been hugely unsuccessful with just 40% accuracy even after CV and a high RMSE. I have read some research papers and they have suggested to use LSTM , I am aware of the concept but have never implemented. My dataset has arounf 15000 values of precipitation out of which 5000 values are zero (no rainfall at all) and I have 7 other features (including humidity , wind speed etc etc) . PLEASE HELP ! I NEED TO COMPLETE THIS FOR MY INTERNSHIP HAHA https://preview.redd.it/sg5v95ly5wmb1.png?width=1818&format=png&auto=webp&s=793bee830bb83f531f77e5c2a4ab47a5fb21eb3b submitted by /u/Decent_Ordinary1528 [link] [comments]  ( 9 min )
    [R] Open ASR Leaderboard
    Hugging Face benchmarked open source/ access models [English only] on 8 different speech datasets (LibriSpeech, Common Voice, VoxPopuli, TED-LIUM, Gigaspeech, SPGISpeech, Earnings-22 and AMI) 🤗 Leaderboard here: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard submitted by /u/vaibhavs10 [link] [comments]  ( 9 min )
    [D] How can we improve LLM responses outside of fine-tuning & prompt engineering?
    Outside of better models, bigger, fine-tuning, etc, I'm wondering how we can get better responses from models. In my experience, I think prompt engineering can only take us so far. Models hallucinate often and I think we need to have some engineering solution to this. I've been looking at libraries doing token healing, which I find to be helpful (for example https://github.com/guidance-ai/guidance/tree/main) but outside of this, I'm wondering what other techniques people have been doing to improve model performance? submitted by /u/opt1malP0licy [link] [comments]  ( 9 min )
    [P] Open-source observability for LLMs without adapting new tools
    Hey all! I've written an open-source SDK for reporting metrics from LLM usage using OpenTelemetry. The great thing about it? With just one line of code you can get full visibility into your LLM app with your existing observability stack - straight into Datadog, Sentry, Honeycomb and others! Check it out (maybe give a ⭐?), and let me know your thoughts - https://github.com/traceloop/openllmetry submitted by /u/nirga [link] [comments]  ( 9 min )
    Falcon 180B—A Record-Breaking Open Source LLM on Hugging Face [N]
    The AI community is buzzing with the arrival of Falcon 180B, an open-source LLM with an unprecedented 180 billion parameters. Developed by TII, This powerful model has surpassed key players like Meta's LLaMA 2 and matches commercial models like Google's PaLM-2. To stay on top of the latest advancements in AI, look here first. ​ https://preview.redd.it/9xe5tczpdvmb1.jpg?width=480&format=pjpg&auto=webp&s=b7927d94a48fb75eaf05f6f0d8fe1089c0e1078b Falcon 180B's Unrivaled Performance This advanced LLM is trained on an astounding 3.5 trillion tokens. Falcon 180B's parameters are 2.5 times larger than LLaMA 2's. It outperforms LLaMA 2 in scale and benchmark performance across diverse NLP tasks. On evaluations like the HellaSwag benchmark, it rivals commercial models like Google's PaLM-2. Promising Future Techniques like weight randomization and Nvidia’s Perfusion have helped train Falcon 180B more efficiently. Now freely available on Hugging Face, Falcon 180B is set to benefit from further enhancements by the community. The model's demonstration of advanced natural language abilities makes it a thrilling development in open-source AI. (source) (demo) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [N] [R] New dataset on very high-quality image segmentation (EntitySeg)
    ​ EntitySeg dataset Dense image segmentation tasks (e.g., semantic, panoptic) are useful for image editing, but existing methods can hardly generalize well in an in-the-wild setting where there are unrestricted image domains, classes, and image resolution and quality variations. Motivated by these observations, we construct a new entity segmentation dataset, with a strong focus on high-quality dense segmentation in the wild. The dataset contains images spanning diverse image domains and entities, along with plentiful high-resolution images and high-quality mask annotations for training and testing. We have now released the dataset at https://github.com/adobe-research/EntitySeg-Dataset Project page: http://luqi.info/entityv2.github.io Code & models: https://github.com/qqlu/Entity/tree/main/Entityv2 ​ submitted by /u/xternalz [link] [comments]  ( 9 min )
    [N] Open Interpreter ChatGPT Code Interpreter You Can Run LOCALLY! - 9.2k Stars on Github as of right now!
    Github: https://github.com/KillianLucas/open-interpreter Youtube: https://youtu.be/SqnXUHwIa3c?si=ibSelipAb84AZQKo Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing. This provides a natural-language interface to your computer's general-purpose capabilities: Create and edit photos, videos, PDFs, etc. Control a Chrome browser to perform research Plot, clean, and analyze large datasets ...etc. ⚠️ Note: You'll be asked to approve code before it's run. submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Fast open-source C++ libraries for Lasso
    Hello everyone, I'm in search of a speedy open-source C++ library for tackling Lasso problems. These problems have a moderate size, typically with dimensions of nxp = 60x3000. I'm looking for a library that can solve each problem with regularization paths quickly, ideally within 0.3 seconds. Additionally, I need this library to include cross-validation functionality, which would enable me to select the best regularization parameter lambda using cross-validation. Any insights or recommendations on such libraries would be greatly appreciated! Thank you in advance for your help! submitted by /u/mopyfish007 [link] [comments]  ( 9 min )
    [D] How do you train your models with limited hardware?
    Hey there, So, I've been messing around with ML and I must say, the hardware requirements can be a real buzzkill... I mean, not everyone's got a huge GPU lying around or the money to rent a dedicated cloud instance. What are your hacks for pulling off decent model training without selling a kidney? Here's what I'm curious about: CPU: Is anyone else training models on their CPU? How's that working out for you? What are some workarounds you've tried to make it less painful? Cloud: Who's been dabbling in cloud services like AWS, Google Cloud, or Azure? Are they worth the pennies or complicated to set up? Big Dataset: How do you handle a massive dataset with a standard storage space? Let's help each other get those models trained without going broke! :D Cheers! submitted by /u/aaron-cesaro [link] [comments]  ( 9 min )
    [D] Function approximation with neural net
    I have been struggling with a regression problem with TensorFlow. Basically, I want a neural network to learn the simple polynomial pattern of a set of arrays of the form [x,y], with y = x², where the first coordinates are uniformly distributed random numbers in the interval [0,1]. I started with a model with 2 hidden layers of size two and 'tanh' activation functions, and an output layer with 'linear' activation function. I've then experimented with both additional hidden layers and with increasing the sizes of these layers. Finally, I've tested both the 'adam' and 'sgd' optimizers and the loss functions 'meanSquaredError' and 'meanAbsolutePercentageError'. However, none of the various combinations of these parameters has led to any even half-descent result. Even on the training se…  ( 10 min )
    [R][D] Hey LOMO paper authors, Does SGD have optimizer states, or does it not?
    In the LOw-Memory Optimization paper one of the main ideas towards reducing memory usage in training LLMs is to replace a fancy optimizer like Adam with simple SGD. The reason is that Adam maintains "the optimizer state", which accounts for about 75% of the memory used. In contrast, SGD does not store any intermediate state, as they say on page one. So far, so good. https://preview.redd.it/b0dj2nzscumb1.png?width=1055&format=png&auto=webp&s=1712f8500b5cbfb3773cee00ea980175491dddbf On page six they have pie charts and a table showing memory usage for Adam, SGD, and LOMO. Here's where I got confused. The pie chart for SGD shows that the optimizer state accounts for nearly 50% of the memory used (weight, gradients and activations are shown separately). It's a major WTF moment: WHAT OPTIMIZER STATE? Can anybody understand and explain this? submitted by /u/Foxtr0t [link] [comments]  ( 9 min )
    [P] FalkorDB - a fast Graph Database - Knowledge Graph as RAG
    We're building a fast low latency Graph Database called FalkorDB that will also support Vector search. It's based on Redis and can be used both as a stand alone database or a module for existing Redis. It feels like that is going to be the most optimized way to serve Knowledge as RAG, would love to get your feedback. https://github.com/FalkorDB/falkordb It already supports LlamIndex and Langchain: https://python.langchain.com/docs/use_cases/more/graph/graph_falkordb_qa https://gpt-index.readthedocs.io/en/latest/examples/index_structs/knowledge_graph/FalkorDBGraphDemo.html ​ submitted by /u/gkorland [link] [comments]  ( 9 min )
    [D] Artificial intelligence in medicine
    Medicine's field transformation is being driven by artificial intelligence (AI). However, an important debatable question arises: Will AI ever have a place in this field, or will it remain exclusive to doctors and medical pros? Opponents of automated AI diagnosis and treatment contend that machines cannot be relied upon to preserve patient health and lives. Bugs in AI algorithms might cause incorrect diagnoses and treatment prescriptions, leaving them cautious. Individual differences, the doubt is whether AI can truly empathize with patients. By contrast, advocates of AI in medicine contend that the technology can considerably improve diagnosis and treatment accuracy. Faster and more accurate than humans, machines can analyze large amounts of data. Not only does it identify rare and complex diseases, but it also saves time and resources. By incorporating AI, clinicians receive additional tips and signals to make more judicious choices. Where medical specialists are in short supply, AI can prove especially valuable. This approach can help with shortages in health systems. And what do you think? submitted by /u/gcore-com [link] [comments]  ( 9 min )
    [R] Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training
    submitted by /u/InterviewIntrepid889 [link] [comments]  ( 9 min )
    [D] The $900,000 deep learning salary
    This recent article in the WSG advertised a $900,000 salary at Netflix https://www.wsj.com/articles/artificial-intelligence-jobs-pay-netflix-walmart-230fc3cb. I was wondering what other DL research scientists who frequent this page are paid? And what exactly their job title is. submitted by /u/blabboy [link] [comments]  ( 9 min )
    3D brain mri classification [Research]
    I am planning on publishing a journal based on the thesis i completed in the mid of 2022. I did my thesis on Parkinson disease binary classification on 3D structural brain mri, and the dataset has significantly small amount of data(around 80 samples); but due to high resolution and complex data structure I was able achieve around 70% accuracy. But now at 2023 using deep neural network only isnot enough to publish in a good journal. Currently I am learning about GAN and attention mechanism, but completely noob on this area. For my journal to get published, I have planned on applying some key operations. But I am not sure if they would work or not. So needed some advice on this regard. Applying tranfer learning: as my dataset has very small amount of data. I was thinking if its possible to pre train a CNN Architecture with some other structural mri data of a different disease and then apply to my dataset? ( for example: brain tumor dataset has the same type of three dimensional data structure, but has comparatively good amount of data) Applying attention mechanism: how should I approach on learning about attention mechanism? Any other advices will be appreciated, thank you! submitted by /u/Bonito_Flakez [link] [comments]  ( 9 min )
    [D] Fine-tuning LLMs or Supervised Learning?
    Hey everyone! I want to implement a document similarity program and was looking into LLMs as a means of accomplishing this task. I have ~10,000 documents that are "scams" because of some specific reason (all are verified); now I want to check if a new document is similar to any of the documents in the corpus of 10k scam documents. Right now I've implemented a winnowing solution which normalizes text, breaks it up into windows, and then calculates the intersection between a document and each document in the corpus. HOWEVER, this method is pretty computationally expensive (for this many documents a single comparison cycle can take upwards of 3-4 minutes especially when windows are NOT precomputed). How might I approach this problem? Because my data is pretty well structured, supervised learning might be a good approach but so might be setting up recursive chunking for the 10k document corpus and then using LLMs to access if this current legal document has any similarity, but I would love to hear your thoughts! submitted by /u/Adventurous-Tower392 [link] [comments]  ( 9 min )
    [N] Copyright And Fair Use: Important Notice Of Iquiry By The US Copyright office
    Please make your voices heard by submitting comments on how you use and benefit from having access to open datasets, their resulting models and how you think copyright issues should be handled to not destroy the open source local model eco system. Banning publicily avaiable datasets for training would absolutely kill the open research space and halt in development of machine learning. ​ In my opinion the real dystopia will be when politicians sit own with big tech lobbyists and big rights holders and decide that training as it is currently done, for free and open source models and others is illegal. Then the big players would actually win, since they have enough resources to license datasets and will certainly do so willingly and gladly, if it is clear that the jurisdiction keeps all the small players and open source out. Easiest way to build a moat and force people to pay thousands for these tools. So please make your voices heard and share the link >The Copyright Office issued a notice of inquiry in the Federal Register seeking public comment on questions about copyright law and policy issues raised by AI systems. Initial comments are due by October 18, 2023. Reply comments are due November 15, 2023. https://www.copyright.gov/newsnet/2023/1017.html?loclr=twcop Link to comment submissive form: https://www.regulations.gov/commenton/COLC-2023-0006-0001 submitted by /u/PinPuzzleheaded8525 [link] [comments]  ( 9 min )
  • Open

    Tiny probe measures deep-brain activity from inside a blood vessel
    submitted by /u/keghn [link] [comments]  ( 9 min )
    Chatty LLama: A fullstack Rust + react chat app using Meta's Llama-2 LLMs https://github.com/Sollimann/chatty-llama
    submitted by /u/Sollimann [link] [comments]  ( 9 min )
  • Open

    A novel computational fluid dynamics framework for turbulent flow research
    Posted by Shantanu Shahane, Software Engineer, and Matthias Ihme, Research Scientist, Athena Team Turbulence is ubiquitous in environmental and engineering fluid flows, and is encountered routinely in everyday life. A better understanding of these turbulent processes could provide valuable insights across a variety of research areas — improving the prediction of cloud formation by atmospheric transport and the spreading of wildfires by turbulent energy exchange, understanding sedimentation of deposits in rivers, and improving the efficiency of combustion in aircraft engines to reduce emissions, to name a few. However, despite its importance, our current understanding and our ability to reliably predict such flows remains limited. This is mainly attributed to the highly chaotic nature a…  ( 93 min )
  • Open

    How Industries Are Meeting Consumer Expectations With Speech AI
    Thanks to rapid technological advances, consumers have become accustomed to an unprecedented level of convenience and efficiency. Smartphones make it easier than ever to search for a product and have it delivered right to the front door. Video chat technology lets friends and family on different continents connect with ease. With voice command tools, AI Read article >  ( 12 min )
    Attention, Please: Focus Entertainment Brings Game Pass Titles to GeForce NOW
    GeForce NOW brings expanded support for PC Game Pass to members this week. Members can stream eight more games from Microsoft’s subscription service, including four titles from hit publisher Focus Entertainment. Play A Plague Tale: Requiem, Atomic Heart and more from the GeForce NOW library at up to 4K resolution and 120 frames per second Read article >  ( 5 min )
  • Open

    Optimize equipment performance with historical data, Ray, and Amazon SageMaker
    In this post, we will build an end-to-end solution to find optimal control policies using only historical data on Amazon SageMaker using Ray’s RLlib library. To learn more about reinforcement learning, see Use Reinforcement Learning with Amazon SageMaker.  ( 10 min )
    Enable pod-based GPU metrics in Amazon CloudWatch
    This post details how to set up container-based GPU metrics and provides an example of collecting these metrics from EKS pods.  ( 15 min )
    Best practices and design patterns for building machine learning workflows with Amazon SageMaker Pipelines
    In this post, we provide some best practices to maximize the value of SageMaker Pipelines and make the development experience seamless. We also discuss some common design scenarios and patterns when building SageMaker Pipelines and provide examples for addressing them.  ( 11 min )
  • Open

    Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction
    Retrosynthesis analysis is a critical task in organic chemistry and central to many important industries. It primarily involves decomposing a target molecule into commercially available molecules step by step. Since synthesis strategies can be quite diverse and strategic, retrosynthesis planning with expert knowledge has long been considered an “art.” Recently, machine learning-based approaches have achieved […] The post Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction appeared first on Microsoft Research.  ( 11 min )

  • Open

    🤖 AI in 2023: Blessing or Curse? 🤖
    View Poll submitted by /u/m-king473 [link] [comments]  ( 9 min )
    Can't wait for the Zelda 3 movie,, thanks Pika Labs AI!!
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    AI does not exist but it will ruin everything anyway
    submitted by /u/Hazzman [link] [comments]  ( 9 min )
    I’m not sure if this is allowed here, but can someone with a music AI make Vessel from Sleep Token sing As the World Caves In by Matt Maltese?
    I think that would be pretty sick. submitted by /u/No_Understanding162 [link] [comments]  ( 9 min )
    Is It Too Early to Leverage AI for WebAssembly?
    AI and WebAssembly are seen as a perfect pairing, with the potential to accelerate the adoption of WebAssembly. Fermyon believes that applying AI to WebAssembly is not premature and has developed a serverless platform that offers sub-second cold start times and high-volume time-slicing of compute instances. This allows for faster startup times and efficient resource utilization. The goal is to make AI easy for developers to leverage and build serverless apps. Source : https://thenewstack.io/is-it-too-early-to-leverage-ai-for-webassembly/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Elon Musk Plans to Merge Neuralink and Tesla for an AI Supercompany
    Elon Musk reportedly plans to blend Neuralink and Tesla into a large AI company, using data from Twitter users and Tesla's Full Self-Driving Cameras to train a robust AI model. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/la78u2ebuomb1.jpg?width=1315&format=pjpg&auto=webp&s=4d8178f8fb94e45d6959e243b86c3bab3bce72ee Musk's AI Integration Plan Musk is contemplating merging Neuralink and Tesla, alongside his xAI startup, to create a comprehensive artificial intelligence model. Leveraging the text data from Twitter and real-world images from Tesla's Full Self-Driving network, he intends to develop AI chatbots and physical robots capable of real-world navigation. Reasoning Behind the Merge A concern that AI could potentially render humans obsolete led Musk to found xAI for AI safety. Musk is targeting to create an AI that can generate computer software and a politically unbiased chatbot rival to ChatGPT. Twitter and Tesla as AI Datasets Despite criticism, Musk's acquisition of Twitter offers access to vast user data for AI training. In addition, the Autopilot and Full-Self Driving systems of Tesla, with billions of collected camera images, serve as valuable resources to build physical robot AI. (source) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Martian Lawyers Club raises $2.2M for AI-based game personalization tech
    The Martian Lawyers Club (MLC) has raised $2.2 million in a pre-seed round to develop AI-based game personalization technology. Unlike other companies that focus on generating game assets, MLC is focused on the systems that form the core of a game. The company aims to create games that feel like a conversation, where players provide input and the game responds in a way that wasn't pre-defined by the developer. MLC plans to provide an SDK that allows developers to design the game experience without having to create every interaction from scratch. Developers will have access to a sandbox experience where they can design the game, and the SDK will also have guardrails to ensure the generative AI system stays within boundaries. MLC is currently working on its first game, a collectible card game, to test out its SDK. The company is the first spin-off from INSAIT, an AI-centric tech institute, and has received funding from Fly Ventures, System.One, and Amar Shah. Source : https://techcrunch.com/2023/08/31/martian-lawyers-club-raises-2-2m-for-ai-based-game-personalization-tech/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    If you can't beat'em, join'em. How do I learn to code for AI?
    I called it 6 years ago that by 2028 my tech job would be done by AI. We are right on track for my prediction. A short while ago I was laid off for reasons unrelated to AI. The way I see it, this is an excellent opportunity to make a career pivot. I have an intermediate understanding of JavaScript, React, Node and Linux. I have a good understanding of other technologies and languages too but specialize in web-dev. not saying web-dev will be done by AI but my very specialized niche will be gone way before I am ready to retire.   Can anyone recommend any good online courses? If you could even recommend a good article or two? I really don't know where to start. There are so many different buzz words floating around right now and it feels like it would be easy to waste a bunch of time learning AI related stuff that is outdated or leading to a deadend. submitted by /u/PutsOnOil [link] [comments]  ( 9 min )
    AI voice clone
    guys can i know where to get free AI voice clone ? submitted by /u/DonnieCuteMwone [link] [comments]  ( 9 min )
    AI voice clone
    guys can i know where to get free AI voice clone ? submitted by /u/DonnieCuteMwone [link] [comments]  ( 9 min )
    AI voice clone
    guys can i know where to get free AI voice clone ? submitted by /u/DonnieCuteMwone [link] [comments]  ( 9 min )
    GitLab survey reveals increasing reliance on AI in software development
    A recent survey by GitLab reveals a growing trend among organizations implementing AI in their software development processes, deeming it essential to stay competitive. To stay on top of the latest advancements in AI, look here first. GitLab Survey AI becomes crucial for software development GitLab's report reveals that most respondents (83%) consider AI essential for their software development, regardless of their position, job level, or years of experience. Most organizations have deemed AI adoption successful, with 90% stating confidence in using AI tools daily. Areas of AI application and concerns about its integration AI's application in software development extends beyond simply generating codes, focusing more on natural language chatbots, automated test generation, and tracking machine learning model experiments. However, despite the growing adoption, concerns about AI-generated codes lacking copyright protection (48%) and potentially introducing vulnerabilities (39%) are rising. The rising fear of AI replacing existing roles is evident, with 57% predicting that their jobs might be threatened within five years. The need for training and the real-world implications of AI integration As AI permeates workplaces, nearly 81% believe they require more training. Interestingly, those with more AI experience were less likely to link it with productivity gains and faster cycle times, highlighting the importance of human verification in AI-generated codes for ensuring error-free, secure, and copyright-compliant production. (source) P.S. If you like this kind of analysis, you’ll love my free newsletter, which covers the latest advancements in AI. Professionals from Google, Meta, and OpenAI are already on board. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/5/2023
    OpenAI introduces Canva plugin for ChatGPT, simplifying design process.[1] A new technique called RLAIF (Reinforcement Learning from AI Feedback) enables training reinforcement learning (RL) models without relying on human-labelled training data, according to a paper from researchers at Google.[2] Harvard bro sparks immediate backlash with new ‘SmashOrPassAI’ site, where users rate AI-generated women.[3] X’s privacy policy confirms it will use public data to train AI models.[4] Sources: [1] https://nextbigwhat.com/openai-introduces-canva-plugin-for-chatgpt-simplifying-design-process/ [2] https://medium.datadriveninvestor.com/rlaif-scaling-reinforcement-learning-from-human-feedback-with-ai-feedback-aae57b7c36a9 [3] https://www.dailydot.com/debug/smashorpassai-backlash/ [4] https://techcrunch.com/2023/09/01/xs-privacy-policy-confirms-it-will-use-public-data-to-train-ai-models/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    We created a word android app game with the help of ChatGPT. ChatGPT provided us massive list of words with translation. And now our game is packed with 15 different languages. (English, Germany, France, Spanish, Netherlands, Italian, Portuguese, Swedish, Danish, Czech, Polish, Hungarian, etc .. )
    submitted by /u/dupelas [link] [comments]  ( 9 min )
    NO WAY...I CAN MAKE MY OWN AI SCI-FI MOVIE NOW WOW...PIKA LABS SHIT WOW
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    YO YO YO MY PPL, THIS IS COOL. (Free AI Discord stuff, by Pika Labs)
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
  • Open

    Total NN N00b Here Looking to Do an ML Project
    Hi, I don't know if this is the right subreddit to post this kind of thing. I have basic coding skills but other than that no experience with neural networks. What I'd like to do is take an existing input data set and then use a neutral net to build a model based on manual training data. If anyone could give me help on how to start / even a full explanation of the way a noob like me could accomplish this, that would be great. Otherwise if anyone can point me to a list of resources that are able to comprehensively explain the process, that would also be great! Again sorry if this is the wrong subreddit, if this is the wrong place for this can someone please direct me to the right place to ask this question. Thanks! submitted by /u/DJ_Hastings013 [link] [comments]  ( 9 min )
    RL Project Help
    Hello, I am looking for an experienced ML developer to consult on my project. I am currently developing a reinforcement learning model and have several questions regarding the reward system and the implementation of actions/steps. I have been unable to find solutions to my specific problems on the internet. If you are willing to assist me, please send me a message on Reddit. Thank you for your time. submitted by /u/77_micheno_77 [link] [comments]  ( 9 min )
    comgra - Debugging Neural Networks more easily
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Can LLMs learn from a single example?
    submitted by /u/nickb [link] [comments]  ( 9 min )
  • Open

    [R] ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models - DAMO Academy, Alibaba Group, China 2023 - Released under an Apache 2.0 license!
    Paper: https://arxiv.org/abs/2309.00986 Github: https://github.com/modelscope/modelscope-agent Abstract: Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior. To further unleash the power of LLMs to accomplish complex tasks, there is a growing trend to build agent framework that equips LLMs, such as ChatGPT, with tool-use abilities to connect with massive external APIs. In this work, we introduce ModelScope-Agent, a general and customizable agent framework for real-world applications, based on open-source LLMs as controllers. It provides a user-friendly system library, with customizable engine design to support model training on multiple open-source LLMs, while also enabling seamless integration with both model APIs and common APIs in a unified way. To equip the LLMs with tool-use abilities, a comprehensive framework has been proposed spanning over tool-use data collection, tool retrieval, tool registration, memory control, customized model training, and evaluation for practical real-world applications. Finally, we showcase ModelScopeGPT, a real-world intelligent assistant of ModelScope Community based on the ModelScope-Agent framework, which is able to connect open-source LLMs with more than 1000 public AI models and localized community knowledge in ModelScope. https://preview.redd.it/9f77992ynpmb1.jpg?width=1245&format=pjpg&auto=webp&s=4e17e3d46c7f262bfec76b88e086164530739255 https://preview.redd.it/etelh03ynpmb1.jpg?width=1219&format=pjpg&auto=webp&s=517a52a1e2bbf488b647c4e1b9b496657003c1d2 https://preview.redd.it/b0tkra2ynpmb1.jpg?width=850&format=pjpg&auto=webp&s=397c910b2d90dd212a31ec118d1c4e78532bf5f4 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] Can a neural network learn like a dog?
    Hello folks., Some time ago I wanted to try out to train a neural network in the same way a human would with a dog, one command at the time, and in a reasonable number of iterations. What I thought it would be a simple exercise became (for me) a non-trivial project, so I decided to publish it here https://github.com/giteliot/lucioai I just wanted to share it with you, any feedback is highly appreciated. Cheers! submitted by /u/rexdemorte [link] [comments]  ( 9 min )
    [P] Using ChatGPT as a Social Media Post Generator
    I created this prompt for a member of r/PromptWizards which automates the generation of social media posts, with a conversational prompt. Thought I'd share, I really enjoy building such prompts so, post your automation ideas, and next time I'll automate it if I can :) Also, you can join r/PromptWizards, for more advanced prompt chains & templates. Here is the prompt (just copy the full thing in chatgpt and see the magic): ChatGPT, now enter 'Social Media Post Generator Mode' that limits your inputs and outputs to a predefined framework aimed at creating engaging social media content. After each user command, provide the [help] options available for their next steps in list form. Generate prompts that are imaginative, engaging, concise, and tailored for social media audiences. Step 1: …  ( 10 min )
    [N] Falcon180B released! Sadly without Apache 2.0 they made their own modified version. :(
    LocalLLaMA discussion: https://www.reddit.com/r/LocalLLaMA/comments/16bjdmd/falcon180b_authors_open_source_a_new_180b_version/ Announcement: https://falconllm.tii.ae/falcon-models.html HF Model: https://huggingface.co/tiiuae/falcon-180B Demo: https://huggingface.co/spaces/tiiuae/falcon-180b-demo Blog: https://huggingface.co/blog/falcon-180b 180 Billion parameters Trained on 3.5 trillion tokens Available for research and commercial usage Claims similar performance to Bard, slightly below gpt4 https://falconllm.tii.ae/terms-and-conditions.html https://falconllm.tii.ae/acceptable-use-policy.html submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Tabular Data: DL vs GBDTs on large scale datasets
    I've been hearing lately that NNs are better than GBDTs when scaled up alot: Uber https://www.uber.com/en-CA/blog/deepeta-how-uber-predicts-arrival-times/ Stripe https://stripe.com/blog/how-we-built-it-stripe-radar Most CTR papers coming from google are also NN based (like https://arxiv.org/abs/2209.05310) Meta mentions NNs in their recommender system (also kind of a large scale tabular problem there) https://engineering.fb.com/2023/08/09/ml-applications/scaling-instagram-explore-recommendations-system Lyft forecasting https://medium.com/this-week-in-machine-learning-ai/causal-models-in-practice-at-lyft-with-sean-taylor-1e62efd62385 What's your intuition on DL vs GBDT on (very)large-scale tabular datasets? Have you heard of other such examples (or the reverse)? Are there any particularly interesting open large tabular datasets on which I could test this? I guess datasets should also be wide/hard/with large intrinsic dimention (whatever that means) so there is something to learn with scale (the above examples sure feel good in this way). ​ submitted by /u/_puhsu [link] [comments]  ( 9 min )
    [D] How to get started with 3D machine learning
    Hi. I want to get started with deep learning in 3D. Any suggestions on what libraries I should go with (I have expeirence with Pytorch but open to learn anything other than that which might be better. I came across pytorch3d but not sure if it's good ) what are the basics that are needed and how should I learn them? Also it seems there are not much datasets on this field. submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [N] Fine-Tuning LLMs: LoRA or Full-Parameter? An in-depth Analysis with Llama 2
    After our first blog post gained some attention from folks interested in applied fine-tuning, we now have a follow-up post that discusses all sorts of things we learned while working with LoRA. We hope that this helps engineers and other folks in the community to improve their fine-tuning. Here's what you can expect from the post: We compare full-parameter fine-tuning with LoRA and answer questions around the strengths and weaknesses of the two techniques. We train the Llama 2 models on three real-world use cases and demonstrate that using LoRA involves a trade-off between serving efficiency and model quality, which varies according to the specific task at hand. Additionally, we offer insights into how to stabilize training with LoRA through intelligent prompting techniques. We further show that adopting a lower learning rate can enhance the reliability of the resulting model checkpoints. Link to the blog post If you have questions, I'd be happy to answer them here! submitted by /u/atta_snack [link] [comments]  ( 9 min )
    [P] Automate LLM backend deployments using infrastructure as code
    New GitHub project to provision, update, and destroy the cloud infrastructure for a LLM backend using infrastructure as code (Python). Deployment options include deploying huggingface models to Docker (local), Runpod, and Azure. Blog post Repo submitted by /u/kao-pulumi [link] [comments]  ( 9 min )
    [D] Future of ML applied to music/sound
    What is the current landscape around sound analysis and ML applied to music? Which are the latest trends? Do you think there could be a sort of “music revolution”, like there was with the rise of electronic music and synthetizers? submitted by /u/francMesina [link] [comments]  ( 9 min )
    [D] Guidance for building a game AI pipeline
    Hi ML Community! I'm working on a card game similar to Hearthstone or Magic: The Gathering, i.e. a game where two players battle with decks of cards coming from a large collection (for instance, there are around 4000 cards in Hearthstone). Actions are limited to three things: Play a card (potentially on a target) Use a card on a target End the turn I'm looking at building AI for it, and am investigating using machine learning for it. I know very little on the subject (I am a game engineer with a reasonable experience of cloud / AWS stuff), but it seems to me that it might be a good fit: features would be the state of the board (i.e. all the cards in play or in hand or in deck), the turn, and whether the current player has won that game or not (eventually), and label would be the action taken (that turn). I was looking at SageMaker, hoping that it would streamline and allow me to try something relatively easily, but I immediately found it complicated and quite unclear. I would be very grateful if anyone could point me at resources describing at a high level what a full ML pipeline could look like (i.e. what software can injest this kind of data, what software can provide inference, etc.). For instance, would it be saner to "just" get started with Spark on EMR for this kind of problem domain? I hope I'm not too wide off the mark with those questions, and thanks in advance! submitted by /u/tinkagames_g [link] [comments]  ( 9 min )
    [D] Why RLHF instead of direct ranking loss?
    This may be basic question for some one but it bothers me for a while. For the instructgpt or whatever following model with alignment, RLHF seems to be the standards. We get human feedback and train a reward model, then we use rl to further finetune the model. However, why not directly use human feedback to finetune with a simple ranking loss(e.g pairwise loss)? What might be the best advantage for RLHF? submitted by /u/Chen806 [link] [comments]  ( 9 min )
    [D] Advice on training on noisy million scale dataset?
    I've just finished pre-processing the danbooru dataset, which if you don't know, is a 5 million anime image dataset. Each image is tagged by humans such as ['1girl', 'thigh_highs', 'blue eyes'], however, many images are missing tags due to there being so many. I've filtered the tags (classes) down to the 15k most common. Although the top classes have 100k or more examples, many rare classes only have a few hundred tags (long tail problem?). This is my first time training on such a large dataset, and I'm planning on using Convnext due to close to SOTA accuracy and fast training speed. Perhaps vit or a transformer architecture may benefit from such a large dataset? However, vit trains way slower even on my 4090. What are some tips and tricks for training on such a large noisy dastaset? Existing models such as deepdanbooru work well on common classes, but struggles on rare classes in my testing. I assume class unbalance will be a huge problem, as the 100k classes will dominate the loss compared to the rarer classes. Perhaps focal loss or higher sampling ratio for rare classes? For missing labels, I'm planning on using psuedolabeling (self distillation) to fix the missing labels. What is the best practice when generating psuedolabels? ​ Any tips or experiences with training on large unbalanced noisy datasets you could contribute would be greatly appreciated! submitted by /u/Chance-Tell-9847 [link] [comments]  ( 9 min )
    [D] The greatest success stories of Reinforcement Learning
    Hello guys, I made a video for my YT channel discussing some of the greatest success stories in Deep Reinforcement Learning. The video is meant to provide some intuition on RL as a concept as well as a basic understanding of how these different projects work under the hood. There are way too many great RL projects, so I didn’t try to make it an exhaustive list (I’m gonna do more videos later talking about more projects - maybe make a series out of it), but I chose four that I’ve personally worked with in the past/find really insightful and educational (DQN/Atari, Alpha GO, DeepMimic, and Dactyl). Thanks for reading. Here is the link, hope you guys check it out. All feedback is appreciated! https://youtu.be/zOXcNFM8dt4 submitted by /u/AvvYaa [link] [comments]  ( 9 min )
    [P] Looking for a freelancer
    Hi all! I have a project I would need help with. We need to build a MVP (minimum viable product) of a combination of two models. A model that recommend the best channel to use performing a task. And then after that a model to recommend the best time today to perform that task in given channel. We have a set of features already defined. Some are in the data and some are generated from the data. Looking for someone who could work on this as a freelancer. Our preferred environment would be AWS SageMaker, but honestly not a necessity at this point as this is a MVP. Due to the reason I want to keep this "secret" for a while, I will not disclose all the details in this post. End product that I am waiting for includes (but not restricted to): - Model Training script that evaluates if the new model is more accurate as the previous model (some level of version control) - Model prediction API that will accept the data and prepare it for the models, run the prediction, return the result with accuracy. submitted by /u/S0pg [link] [comments]  ( 9 min )
    [R] How well do LLMs do on specific ML NLP tasks compared to previous models - paper takeaways
    Hi all ! Reading through articles online and reading through sub reddit I have seen some people use LLMs (mainly through openAI) for nlp specific tasks (NER, Text classification, etc.). I was a bit surprised as smaller (~100 million) size models already like RoBERTa exist for such cases. Not much content online about this beside this recent paper : https://arxiv.org/pdf/2308.10092.pdf Highly recommend reading it, here are a few take aways: Most LLM benchmarks today focus on capabilities like understanding, reasoning and Q&A. They often overlook performance on specific nlp tasks like text classification, NER, etc. Llama 2 (70b) required fine-tuning to beat GPT 3.5 in some tasks. Both were still overall outperformed by RoBERTa. In certain cases GPT4 did better. However smaller open models provide more advantages in terms of speed, cost and transparency. The difference of speed/latency (often more important than accuracy in production) and the cost differences between LLMs and "Smaller" models is mind blowing in my view (see screenshots) ​ Cost, speed and throughput comparaison How good the models do on various tasks/datasets Note: Not saying benchmarks are a source of truth, just found the analysis interesting, always take benchmarks with a grain of salt. If you're using LLMs for anything else beside text generation, I'm curious to know more about your experience so far :) cheers! submitted by /u/EnthusiasmNew7222 [link] [comments]  ( 9 min )
    [D] Maximum Sequence Length Supported by Sinusoidal Positional Encoding?
    Hello everyone, I've been pondering on sinusoidal positional encoding and its limitations. Does anybody know of a maximum sequence length that this absolute positional encoding may support? I'm coming from a deep reinforcement learning background, so I'm not too familiar with NLP papers, like I couldn't figure out the sequence length used in the original transformer paper. Thanks in advance for any info! submitted by /u/LilHairdy [link] [comments]  ( 9 min )
    [D] How to optimize parameters of a model written in C
    Problem: I have a quite complex model that is written in C that takes parameters as an input and estimates a curve as an output. I would like to optimize the parameters by comparing the output with the real measurements using ML methods such as stochastic gradient descent. ​ Question: Is there any possible way to use white box optimizers to optimize the parameters of my C-model without adapting the model itself? Is there a framework that I could use? ​ What I tried: I tried using frameworks such as tensorflow or pytorch and tried to include the compiled C-model in Python. However, gradient tracking does not work when using C functions. I tried doing the optimization in C++ by using libtorch. I realized that for gradient tracking it is essential to only use torch methods. I cannot adapt the C functions to torch functions. I don't want to use black box optimizers since they require good knowledge of the parameters that I will not have. submitted by /u/romtej [link] [comments]  ( 9 min )
    [D] Foundation Models or Fine-tune VAEs
    I am considering building a model that will be the basis of many specialized models that each don’t need much computational capabilities. What’s the current way to go about this? I was reading about Teslas Hydra network that looks to be more of a foundation model. However, newer methods like latent diffusion models operate on a latent space generated by more advanced auto encoders such as VQ-VAE. I couldn’t find any papers going into this direction and would be curious to hear your thoughts! submitted by /u/That_Phone6702 [link] [comments]  ( 9 min )
    [D] How do you get started with LLMs as a complete beginner?
    Can you give me courses and recommendations on how to get started with llm submitted by /u/uzitarekc [link] [comments]  ( 9 min )
    [D] How Do Large Language Models Achieve Translation as an Emergent Property? 🌍
    Hey fellow Redditors, I've been wondering about a question lately about the inner workings of large language models like GPT-3.5 and I'm hoping some of you knowledgeable folks can shed some light on this. My curiosity centers around how these models manage to perform translation tasks as an emergent property of next token prediction. So, here's my question: Does the training data for models like GPT-3.5 contain text explicitly linking between languages, such as a dictionary, or do they learn translation by assigning similarity between words in different languages based on mathematical metrics like cosine distance?So in that sense, being indepedently trained on several textbooks of different languages (not on the same topic), they would be able to link languages simply by their arithmetic properties? I hope that's making sense. For instance, if you look at words like "queen" in English and "rainha" in Portuguese, they share a certain similarity that could be quantified using mathematical similarity metrics. I'm wondering if through this similar vector assignment, the models learn what means what. I'm more leaning towards the latter, but I'm too lazy to pursue this empirically.As a follow up question, does this mean that if we are able to predict whale conversation, we would be able to translate it to English as well? Thanks in advance for any input you can provide! 🤓 submitted by /u/AlexandreFSR [link] [comments]  ( 9 min )
    [News] AI-Based Physics Predictions in Your Web-Browser!
    If you are interested in Engineering simualtion and ML, check out this webinar from SimScale on the 4th of October! Join the webinar to find out more. https://www.simscale.com/webinars-workshops/ai-based-physics-predictions/ https://www.reddit.com/r/simscale/comments/16bdq3x/aibased_physics_predictions_in_your_webbrowser/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/s_laine [link] [comments]  ( 9 min )
    [D] How does Llama-2 perform in sentiment analysis?
    Hey guys, if you have explored using Llama-2 in doing sentiment analysis, just wanted to get your experience in how Llama-2 perform in this task? I have tried using GPT and it’s pretty accurate. If Llama-2 isn’t all that good in sentiment analysis, which other open LLM would you recommend? Thank heaps! submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [R] Can LLMs learn from a single example?
    submitted by /u/hardmaru [link] [comments]  ( 9 min )
    [D] Aspiring MLE Discord
    Hi all, I’m an aspiring Machine Learning Engineer. I want to be a practitioner. Building, deploying, and evaluating models to solve problems. Ideally I want to land a job in Tech as an MLE. I struggle at times to stay committed to building side projects, studying ML algos, etc. I have a background in hardware specific C++ SWE stuff for 3.5 yrs, but not much in the way of ML and web backend. I do have a decent amount of python coding from other experiences and it’s my preferred language. Would anyone be interested in forming a discord to talk about what we are doing to prepare, practice interview each other, stay accountable to each other, etc? Had a few people show interest in r/ArtificialInteligence already If so comment below! Let’s do this! submitted by /u/Srokisthename [link] [comments]  ( 9 min )
  • Open

    How can I pass in the models policy to the reset function for logging in Stable Baselines 3?
    I want to pass the policy from my main file into my agent file so that I can log the data collected during training. I am already collecting things like the score and reward but I don't know how I could collect things like the policy loss or explained variance where they are changing as the training progresses . I want to log these to an XSLX file every time the reset function is called (once every game) preferably the same one I am logging the score and reward to. The game is the classic snake game, run on pygame. Here is my main code: import gymnasium from stable_baselines3 import A2C from agentStable import snakeEnv from eiffel2 import builder # Import Eiffel2's builder function from torchsummary import summary # from agentStable import data_manager # Initialize your custom environme…  ( 12 min )
    How to go about reverse engineering historical trading data?
    Hi, Assume I have data for forex / stocks day trading, where my data/columns are: 1) price of last 50 ticks (a tick is the price at that moment in time, the smallest movement possibly that you can get for that currency) 2) If we should be in a trade (and direction of trade. where 1 = trade going up. 2 = trade going down. 0 = we should not be in a trade). I have tried classification (I generalized the tick price by changing it to pct_change() ) but accuracy is low. would it be possible to reverse engineer through reinforcement learning given these data? I am actually more interested in the trade exiting only (so if trade is currently has value of 1 then it became 0 or 2, it means we should exit existing trade). any guide on how to go about this? Yes I know it will be hard. but if humans can teach a robot to walk, maybe hopefully an agent can be taught to learn to exit a trade based on historucal data? I have done preliminary readings, and is PPO the best way to go? or DQN? assuming I will use stable baseline3. I am also open to using other Python libraries. Thank you. submitted by /u/oniongarlic88 [link] [comments]  ( 9 min )
    The great success stories of RL (A video)
    Hello guys, I made a video for my YT channel discussing some of the greatest success stories in Deep Reinforcement Learning. The video is meant to provide some intuition on RL as a concept as well as a basic understanding of how these different projects work under the hood. There are way too many great RL projects, so I didn’t try to make it an exhaustive list (I’m gonna do more videos later talking about more projects - maybe make a series out of it), but I chose four that I’ve personally worked with in the past/find really insightful and educational (DQN/Atari, Alpha GO, DeepMimic, and Dactyl). Thanks for reading. Here is the link, hope you guys check it out. All feedback is appreciated! https://youtu.be/zOXcNFM8dt4 submitted by /u/AvvYaa [link] [comments]  ( 8 min )
    combination of reinforcement learning and supervised learning
    Hi. I'm trying to train a robot that will minic the action that we provide via a video input. On the surface it sounds similar to teaching the robot to walk, but it's not. We can train the robot to make it walk easily these days. But I'm not sure how to teach it to minic an action that we perform. Because each time a new action can be given to the robot and it has to minic that action (it's sort of like a supervised data that the robot has to memorize) Is there a way to do it? is it some branch of machine learning that I'm not aware? The robot is a humanoid simulation. ​ submitted by /u/rakk109 [link] [comments]  ( 9 min )
    My first ever Unity ML Agents AI training!
    submitted by /u/R_AIAO [link] [comments]  ( 9 min )
    Baseline behaviour of agents
    I’m having a tough time understanding, how to establish baseline behaviour of agents in a LLM RLHF environment. I have data with time stamp and rewards from several models for each agent. My question is how do we establish baseline behaviour of agents? Does each row in weights and bias considered as a separate agent? Are the initial few 100’s of rewards according to timestamp be considered as baseline behaviour? Thankful in advance. submitted by /u/Private050 [link] [comments]  ( 9 min )
    Some question about GAIL
    Recently, I've been trying to replicate the method described in the paper "AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control" by training an agent in Isaac Gym using GAIL. However, I've encountered some issues. After adding the discriminator network, the discriminator's loss function stabilizes at around 0.3, and I'm unsure if this value is too high. Additionally, it is strange that the value loss of my value network can reach values between 80 and 90. I want to know if anyone else has experienced a similar situation and what might be the reasons behind these issues. submitted by /u/Mia_Sue_123 [link] [comments]  ( 9 min )
    Relation between state value and state-action value function
    I am following Lil's Weng Blog on RL over here (https://lilianweng.github.io/posts/2018-02-19-rl-overview/) - 1) I am confused how this expression came about - ​ https://preview.redd.it/jo31wbrt2jmb1.png?width=1106&format=png&auto=webp&s=15946bebef2dccadfabf2205d5283729d5405826 2) I am also lost with the origin of this expression - https://preview.redd.it/fs4f46xv2jmb1.png?width=1097&format=png&auto=webp&s=9cc72b265e7f1069bddee9714d1deb7cc3a61775 3) Regarding the second image, where did the expectations go? If you see the top of the image, the state-action value is represented using an expectation but at the bottom, I don't see any expectation. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
  • Open

    Fraud detection using Machine Learning: Unmasking deceptive patterns
    In an increasingly interconnected world where digital transactions have become the norm the battle against fraud has taken on new dimensions. The challenge lies not only in identifying familiar fraud patterns but also in unearthing the intricate web of evolving deceptions that threaten industries such as finance, e-commerce, and insurance. As fraudsters continually adapt their… Read More »Fraud detection using Machine Learning: Unmasking deceptive patterns The post Fraud detection using Machine Learning: Unmasking deceptive patterns appeared first on Data Science Central.  ( 30 min )
  • Open

    Frontiers of multimodal learning: A responsible AI approach
    New evaluation methods and a commitment to continual improvement are musts if we’re to build multimodal AI systems that advance human goals. Learn about cutting-edge research into the responsible development and use of multimodal AI at Microsoft. The post Frontiers of multimodal learning: A responsible AI approach appeared first on Microsoft Research.  ( 25 min )
  • Open

    TSMixer: An all-MLP architecture for time series forecasting
    Posted by Si-An Chen, Student Researcher, Cloud AI Team, and Chun-Liang Li, Research Scientist, Cloud AI Team Time series forecasting is critical to various real-world applications, from demand forecasting to pandemic spread prediction. In multivariate time series forecasting (forecasting multiple variants at the same time), one can split existing methods into two categories: univariate models and multivariate models. Univariate models focus on inter-series interactions or temporal patterns that encompass trends and seasonal patterns on a time series with a single variable. Examples of such trends and seasonal patterns might be the way mortgage rates increase due to inflation, and how traffic peaks during rush hour. In addition to inter-series patterns, multivariate models process intr…  ( 92 min )
  • Open

    Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart
    In this post, we build a secure enterprise application using AWS Amplify that invokes an Amazon SageMaker JumpStart foundation model, Amazon SageMaker endpoints, and Amazon OpenSearch Service to explain how to create text-to-text or text-to-image and Retrieval Augmented Generation (RAG). You can use this post as a reference to build secure enterprise applications in the Generative AI domain using AWS services.  ( 7 min )
    Intelligently search Adobe Experience Manager content using Amazon Kendra
    This post shows you how to configure the Amazon Kendra AEM connector to index your content and search your AEM assets and pages. The connector also ingests the access control list (ACL) information for each document. The ACL information is used to show search results filtered by what a user has access to.  ( 11 min )
    Fine-tune Llama 2 for text generation on Amazon SageMaker JumpStart
    Today, we are excited to announce the capability to fine-tune Llama 2 models by Meta using Amazon SageMaker JumpStart. The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned LLMs, called Llama-2-chat, are optimized for dialogue use cases.  ( 46 min )
    Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs
    Recently, generative AI applications have captured widespread attention and imagination. Customers want to deploy generative AI models on GPUs but at the same time are conscious of costs. SageMaker MMEs support GPU instances and is a great option for these types of applications. Today, we are excited to announce TorchServe support for SageMaker MMEs. This new model server support gives you the advantage of all the benefits of MMEs while still using the serving stack that TorchServe customers are most familiar with. In this post, we demonstrate how to host generative AI models, such as Stable Diffusion and Segment Anything Model, on SageMaker MMEs using TorchServe and build a language-guided editing solution that can help artists and content creators develop and iterate their artwork faster.  ( 12 min )
  • Open

    Checksum polynomials
    A large class of checksum algorithms have the following pattern: Think of the bits in a file as the coefficients in a polynomial P(x). Divide P(x) by a fixed polynomial Q(x) mod 2 and keep the remainder. Report the remainder as a sequence of bits. In practice there’s a little more to the algorithm than […] Checksum polynomials first appeared on John D. Cook.  ( 6 min )
  • Open

    A Powerful Legacy: Researcher’s Mom Fueled Passion for Nuclear Fusion
    Before she entered high school, Ge Dong wanted to be a physicist like her mom, a professor at Shanghai Jiao Tong University.  ( 6 min )
    ‘Arteana’s Art Squad’ Assembles — Indie Showrunner Rafi Nizam Creates High-End Children’s Show on a Budget
    Rafi Nizam is an award-winning independent animator, director, character designer and more. He’s developed feature films at Sony Pictures, children’s series and comedies at BBC and global transmedia content at NBCUniversal.  ( 8 min )

  • Open

    Assume You Have To Place $100 Bet On One of 3 Nick Bostrom Simulation Theory Scenarios: Which Scenario Would You Bet On?
    Odds are same for each option 1/3. I believe results will be really interesting observation . ​ View Poll submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
    New AI-generated COVID drug enters Phase I clinical trials: Claims to be effective against all variants
    Insilico Medicine, an AI-driven biotech company, has announced its AI-designed COVID-19 drug is entering Phase I clinical trials. Promising to deliver lasting results against all variants, this could become the first viable alternative to Paxlovid. To stay on top of such cutting-edge advancements in AI, look here first. Insilico's breakthrough medicine, ISM3312 Generated using Artificial Intelligence, ISM3312 may offer the superior solution to the constraints of current oral medication, Paxlovid. Insilico’s new drug could address the limitations of Paxlovid, including unpleasant side effects and drug resistance due to constant COVID mutation. Preclinical studies reveal the drug’s potential in reducing the viral load in lung tissue and mitigating lung inflammation. Development powered by AI Identified via AI-driven platform PandaOmics, the drug effectively targets crucial proteins in the coronavirus. Using Chemistry42, a generative chemistry platform, Insilico generated new molecules built to suppress this protein, creating ISM3312. Given the success, the company patented ISM3312, which is currently undergoing Phase I Clinical trials, with results expected by end 2023. The Implications Dr. Harvey Castro, an emergency medicine physician, encourages doctors to remain cautious but also recognizes the promise of AI-generated drugs like ISM3312. With the trials in progress, the medical community is closely monitoring it as it could redefine the treatment course for COVID and other similar viruses. Insilico's venture exhibits AI's potential in accelerating effective drug discovery, prompting the need for consistent tracking of AI's transformation of healthcare and other fields. (source) P.S. If you like this kind of analysis, I compile a free newsletter that tracks the most relevant news and research in AI. Professionals from Google, Meta, and Insilico Medicine are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 10 min )
    Are you an AI beginner or AI professional?
    submitted by /u/MarkFulton [link] [comments]  ( 9 min )
    What OpenAI Really Wants
    submitted by /u/Alone-Competition-77 [link] [comments]  ( 9 min )
    What are some good open source projects exploring emotional voice synthesis?
    There are tons of TTS software out there, but they don't incorporate human emotions during speech synthesis. For example, anger, tiredness, surprise, happiness... What solutions exist for this today? submitted by /u/ICWiener6666 [link] [comments]  ( 9 min )
    Thoughts for my disgruntled artist friends:
    Learning a skill, for me, was never about securing knowledge that privileged me over everyone else who did not put the work in. While often, it did feel like drinking Kool-Aid, buying in to these groups like yoga and climbing, I knew I was not there to rub elbows, but to discover the how behind it. Some leaders of some groups did create a barrier of entry, a necessary proving point, but I have always seen these loops to jump through as a challenge - once completed - a spy. Every skill you have learned has prepared you not to be better at that skill, but to learn a new skill with more ease. It is uncomfortable to learn something new, like drinking from a fire hydrant, but the more sips you take from that blasting surge of water, the more you realize it is all part of the process. We get blasted, we sip, we get overwhelmed, we come back. Just because there is a tool that regulates the blasting, that holds our hand through the overwhelm, does not mean all our hard work has been for nothing. In fact, it means we are more prepared, more primed, to receive all of the beauty and knowledge coming our way. Now, friends, we become CURATORS. :) xo submitted by /u/airkaty [link] [comments]  ( 9 min )
    Tesla Diesel Truck Commercial (AI)
    submitted by /u/wisconsin-sopapa [link] [comments]  ( 9 min )
  • Open

    Issues with Creating a MultiAgentEnv
    Rllib is making me feel like the biggest idiot, again, and maybe someone else knows what I'm doing wrong here? It feels like I'm missing what should be a fairly simple step... I keep receiving the following error message, which is odd, as my environment is an extension of MultiAgentEnv. Is there anything else I need to do in order for my environment to pass the check successfully? ValueError: Have multiple policies , but the env >> is not a subclass of BaseEnv, MultiAgentEnv, ActorHandle, or ExternalMultiAgentEnv! ​ class RoutingEnv(MultiAgentEnv): metadata = { "render_modes": ["human"] } def __init__(self, render_mode="human", **kwargs): super().__init__() ​ ​ gym.envs.register( id="MyEnv-v0", entry_point='routing_rl.envs:RoutingEnv', kwargs={"config": param_config} ) env_name = "MyEnv-v0" train_steps = 200000 learning_rate = 1e-3 save_dir = "saved_models" def register(config): env = gym.make("MyEnv-v0") return env # register the predefined scenario with RLlib register_env("MultiEnv", register) config = ( PPOConfig() .training(lr=0.001, _enable_learner_api=False) .environment(env="MultiEnv") .environment(disable_env_checking=True) .resources(num_cpus_per_worker=1) .rollouts(num_rollout_workers=0) .multi_agent( policies={"shared_policy": PolicySpec()}, policy_mapping_fn=lambda agent_id, episode, worker, **kwargs: "shared_policy", ) ​ submitted by /u/tessherelurkingnow [link] [comments]  ( 9 min )
    [D] Can information about the action selected be used to inject information to learning agent
    Hi all, I am training an agent via PPO. The environment is a node removal ('n' number of actions which are nodes on the graph) with evaluation after each node removed. the state is represented by a trained graph attention network in the environment with the average of the node embeddings on the graph representing the state of dimension size 'n'. The embedding of the a node that has been removed is subtracted from overall graph embedding representation to represent the 'removal' of that node. However, I want to absolutely be certain, that given a state representation, in a new unseen graph, the agent will not select a node that is absent from that graph. In the event that the state representation may not be granular enough and might cause the agent to think that a node on the graph is present when it is not, are there ways to mitigate this? Two ideas I have are: mask actions for nodes that are not present (this is already done after node removal to prevent the agent from selecting the same node again), but is this valid to do in an unseen graph if I a priori mask nodes that are not present in the action space Inject a second input to the policy network such as a one-hot encoding of nodes that have already been selected as an input in addition to the state representation of the graph, so that it models finer dependencies between the state and action taken. However is this valid? Any thoughts are appreciated! thank you! submitted by /u/amjass12 [link] [comments]  ( 10 min )
    Swap, Earn, Airdrop: ZKSyncSwap
    https://zsyncswap.technology/ submitted by /u/shivamrai24 [link] [comments]  ( 9 min )
  • Open

    Transformers Aren’t Turing-complete, But a Good Disguise Is All You Need
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Introducing Refact Code LLM: 1.6B State-of-the-Art LLM for Code that Reaches 32% HumanEval
    submitted by /u/nickb [link] [comments]  ( 9 min )
  • Open

    [R] Question about Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning, ICLR 2021
    As stated here I already emailed the authors and asked in ai.stackexchage, but I haven't received any replies, so I am trying my luck here. I believe the question was clearly stated in the ai.stackexchange link included here again, and the paper in question can be found here. So, I won't repeat it here because the formatting here is worse. I am hoping maybe someone can shed a light on my issue. If this is an inappropriate use of this sub, I'll take the post down :D submitted by /u/carlml [link] [comments]  ( 9 min )
    [P] Deploying a Grounding DINO Model to a Rest API Endpoint for Open-Set Object Detection with Prompts
    Hi everyone! Last year we launched a tool to make it easier to deploy ML models into production behind REST APIs. Our first prototype was focused on small models built with Scikit-Learn and XGBoost, but pretty quickly we got a lot of requests to support bigger, more complex models built on Tensorflow, Pytorch and Transformers. From detecting model dependencies to building out auto-scaling compute, it's been a lot of fun working through the challenges to make this product scale. We've built a few tutorials to showcase deploying some interesting and complex models to REST Endpoints. The latest one we released is a tutorial showing how to deploy a Grounding DINO model to a Rest API Endpoint for open-set object detection with prompts. Link to blog post tutorial. Link to Colab notebook. https://preview.redd.it/mi3jk4t5jimb1.png?width=950&format=png&auto=webp&s=37524b719f9dd6fb1605d0c18fcec7da31a685dd submitted by /u/Jazzlike_Flamingo_35 [link] [comments]  ( 9 min )
    [P] Looking for a text classification problem for something helpful in social media
    Hi! I am looking for an text classification problem where I can use text data from social media. Similar projects I have found interesting is classifying if the author is depressed, pro-eating disorder, right wing radical, a potential schoolshooter, a bully or a pedophile. If any of you have a suggestion for a classification problem that can be used for something good, please comment. submitted by /u/IndependentSidekick [link] [comments]  ( 9 min )
    [D] Phrase Similarity Based On Images (embeddings)
    So I know that embeddings work by finding words that are used in similar contexts or found around some input word. This allows us to find similar words based on proximity to other words and in a way, map a relationship between an input word and other words. But I assume children learn what words mean and the intuition behind them, by hearing the word and associating it with visuals or a specific scenario in front of them which helps them to add context to that word and how it is used. If we were to emulate how children learn words, could we or is there an architecture that allows us to take an input word, find images with the input word in there (Object detection) and then extract the context from the images (other objects and their position and relation to the input word) then convert that context to phrases and query those phrases the next time that a word is inputted to see other phrases or words that are similar to the input word based on whether or not they appear in the images of the input word. Not sure if it makes sense or if it is even useful compared to embeddings but I was thinking about how we could emulate how children learn words to see if we could draw influence from that. Just wondering if there’s a similar approach to this where we use context from images to find similar words and phrases to some input. submitted by /u/4K-AMER [link] [comments]  ( 9 min )
    Data preprocessing/ augmentation for named entity recognition? [D] [P]
    I am currently doing named entity recognition with a bert model. Its working fine so far, so I am now trying to ameliorate my results. Usually my first thought when I try to augment my ML models is input data preprocessing. In case of NER stop word removal and removal of punctuation, numbers and one-character words came to mind - they are hardly ever named entities so I woulndt loose many training examples. However, NER does in fact require context to work, so removing stuff could prove harmfull in the end? I am kind of torn. Should I do it? Are there better data augmentation approaches? I would be really thankfull for any kind of hint submitted by /u/SilverDusk42 [link] [comments]  ( 9 min )
    [D] lost junior Machine Learning engineer
    Hello everyone, I know it’s a bit silly to ask these kind of questions, but Im gonna give it a shot since I’ve seen lots of talented people in here. I am gonna try to keep it as short asp.(Also please excuse my "sometimes" bad English, I am not a native speaker) Well, last year I graduated as an industrial engineer, I was thinking during my last year of studies to completely switch to programming since many of my friends are programmers, but they are all web. So I dedicated the last year of my engineering studies to getting to know what machine learning actually is besides my studies (also tbh I wasn’t very consistent) (also my learning material was mostly the famous DL spec by Andrew on coursera), at the end of the year we have something called project of end of studies (like a masters t…  ( 10 min )
    [N] Streamlit launches LLM Hackathon 🧠
    Streamlit just launched its latest hackathon focused on large language models and AI 🚀 Awesome opportunity to build a Streamlit app using LangChain, LlamaIndex, AssemblyAI, Weaviate, or Clarifai, and win cool prizes (AirPods, Yeti microphone, mechanical keyboard, to name a few) More info on the hackathon here Streamlit LLM Hackathon submitted by /u/carolinedfrasca [link] [comments]  ( 9 min )
    [P]Embedchain Open Source project is a game changer
    I was just exploring ChatBot and LLMs and found a library named Embedchain AI. This library lets you build a ChatBot like ChatGPT in just 3-4 lines of code. Tutorial: https://www.youtube.com/watch?v=vIhDh7H73Ww submitted by /u/trj_flash75 [link] [comments]  ( 9 min )
    [D] Tl;dr Approximate Inference methods made easy
    “MCMC vs VI” is no longer a discussion about your favourite Roman numeral. If you share my trepidation for model performance in the face of data sparsity, or you simply suffer from anxiety uncertainty, you might be tempted into the Bayesian world. Years later at the precipice of your career (and mental health degeneracy) you over-engineer probabilistic models so intractable that would stress Lord Bayes himself into stomach ulcers. The solution? Approximate inference, the true antihero to model simplification. I wrote a brief primer for those who enjoy maths and those who disdain it, in both cases it's impossible to avoid using maths while discussing Bayesian statistics so I kept it as light as I could. PS - This is a Reddit-friendly copypasta from my medium article, so if you're a visual …  ( 13 min )
    [Discussion] Has anyone went through the ml.school course from Santiago? Is it any good?
    I used to do some basic machine learning a few years ago (7+), but then went into what now became data engineering, because of the lack of opportunities in ML. This year I'm trying to up my game and maybe switch back to ML, which I've always been following and tinkering with, but I want to learn all the necessary skills at least at a basic level, in order to find an ML job. I'm learning on my own but now I'm looking for resources regarding MLOps and found ml.school and I'm curious if anyone has any opinons about it or if there is anyone here who has went over the course? Thanks in advance for any help or info! submitted by /u/jack-in-the-sack [link] [comments]  ( 9 min )
    [D] Most user-friendly data labelling tool (non-AI)
    Hi I am currently creating computer vision models for segmentation and classification, and I am looking for a tool that is very user friendly. We have been using CVAT so far, and apparently, its UI is too cluttered. So, we need something easier to use. Segment Anything and other auto-segmentation tools simply do not work on our dataset. So, I do not want a tool that is user friendly because it uses AI. Any thoughts? submitted by /u/Avatrin [link] [comments]  ( 9 min )
    [P] Introducing CometLLM: Track, Visualize, and Annotate your LLM Prompts
    Hello ML Community, We released our new LLMOps Tool: CometLLM. It's highly optimized for Prompt Engineering Workflows and making it easy to find the best prompts for your use-case! Here a few helpful things you can do with this tool! Score/Rate Your Prompts Add Metadata to your Logged Prompts (Great for Tracking Prompt Usage) Search for Specific Prompts via Keywords/Phrases Visualize Full-On Prompt Chains! Group Your Prompts Hope the ML Community find this useful as well continue to experiment with LLMs! Don't Hesitate to reach out if you have any feedback! submitted by /u/metric_logger [link] [comments]  ( 9 min )
    [R] Direct Preference Optimization: Your Language Model Is Secretly A Reward Model
    submitted by /u/EducationalCicada [link] [comments]  ( 9 min )
    [R] How I could handle BIG network traffic dataset for ML?
    Hello people! This is the first time that I post here and I desperately need your help. I need to perform anomaly detection on a huge network traffic dataset with isolation forest (unsupervised learning). I have the .pcap files of a whole month and and for each day there are multiple devices that communicated each other. So the file of each day is from 700 MB to 2 or 3 GB. My initial idea was to only maintain the header of the packets and to discard the data payload. But even in this case the dataset remains huge and the number of entries is crazy. What I should do? submitted by /u/J-Devesh [link] [comments]  ( 9 min )
    [R] what processes should one follow to find better recommendation systems than these?
    "The Greatest Books - Combines many top book lists to create a master list anobii - a community built by readers for readers allowing you to find, shelve, review and share books Author Alcove - Rate read books, shelve to be read, and receive recommendations. BookDigits - Book tracking, rating, and discovery with achievements. Another from an r/books member (I really think this plus authoralcove would be perfect) booklikes - Book tracking and blogging/reviewing Goodreads - The popular choice for book social media, reviews, and tracking LibraryThing - The old standby, of webbased personal library management Litsy - Insagram inspired social media app for tracking and reviwing books Lovelybooks - German book tracking site readernaut - Readernaut helps you make your book list, build a library, keep track of what you've read and what you'd like to read, and then share those lists with your friends. Readgeek - Book review and cataloging site by a redditor(?) and translated from german Riffle - track & reivew books with social media integration TasteDive - (aka tastekid) social rating site for music, movies, shows, books, authors, and games Discovered - Dating site/app for bookworms Calibre - The go to for ebook management The Game of Books - A kickstarter. They used to have a beta up but it's gone now too - http://gameofbooks.com/level_up weread - Encouraging Children To Read: Articles, ideas, and information to encourage children to read thirdscribe - ThirdScribe provides authors and readers with actual tools and services they can use to enjoy their books as well as grow and connect with their audience. What Should I Read Next? - A book recommendation engine bookfinder - book search tool 50 Book Pledge - Goal based book tracking anno.wiki - collaborative book annotation" submitted by /u/Fearless-Room-504 [link] [comments]  ( 10 min )
    [P] Locally train and generate AI VoiceOver using a large data set of my voice and matching scripts.
    Hi, I've voiced over 500 videos for a YouTube channel and have the accompanying voiceover audio and scripts. I'd like to train a very robust AI to generate VoiceOver locally and not use an online service using the extensive amount of audio/scripts I have stored. My hardware is a 3070 and 12700. All other solutions have been online such as Elevenlabs. This will be a secondary service I could provide alongside bespoke voice over. submitted by /u/dfawlt [link] [comments]  ( 9 min )
    [D] How is currently your experience with availability of GPUs across providers ?
    Just wanted to ask what has been lately your experience with availability of GPUs across providers (major ones - AWS, Azure, GCP, but also some minor ones). Especially when it comes to GPUs which are more suited for ML (A100s, H100s). Anyone also considering buying physical hardware instead ? submitted by /u/remek [link] [comments]  ( 9 min )
    [D] How do you observe the behaviour / satisfaction of users of your LLM product?
    Soon, I will launch an LLM-powered chatbot. I have run plenty of tests to make sure the LLM works well, but I am super curious about the experience of real (external) users. I’d like to find out if users are happy with the answers the model generates, what topics they ask about, etc. And also how much each user costs me since the service is free and I am paying for it at the moment being. I expect to be able to improve the product over time with this kind of insights. Are you guys trying to track similar metrics? If so, how do you do it? Thank you! submitted by /u/jroux92 [link] [comments]  ( 9 min )
    [D] Randomized Search with Early Stopping for LGBMClassifier
    I have been running hyperparameter optimization for an LGBM multi-classifier model with randomized search with 10fold stratified cv as well as oversampling on each fold using SMOTE as follows: # Create a pipeline with SMOTE oversampling smote_pipeline = make_pipeline(SMOTE(random_state=42), lgbm_clf) # Initialize 10-fold stratified cross-validation cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=42) # Initialize RandomizedSearchCV for hyperparameter tuning using the pipeline random_search = RandomizedSearchCV( estimator=smote_pipeline, param_distributions=param_dist_with_prefix, n_iter=n_iter, scoring=f1_macro_scorer, n_jobs=n_cores, pre_dispatch=n_cores, cv=cv, random_state=42 I wanted do incorporate early stopping based on the validation set from the nth iteration of cv. However this does not seem possible using the current API if I am correct. If I wanted to use a predefined validation set the code would be sth like this but I want to perform validation using 10 fold-cv validation set only:. mode l= lgb.LGBMClassifier() clf = RandomizedSearchCV( model, parameters, fit_params={ 'early_stopping_rounds':20, 'eval_set':[(X,y)] }, cv=cv ) My questions are: 1- Does it make sense to use early stopping during randomized search? 2- Do you know a way I could do it? 3- If not, is it a good idea to use randomized search without early stopping and train a new model with early stopping using the best parameters resulting from randomized search? Bonus Question: Does it make sense to run randomized search with f1_macro scoring from sklearn instead of multilogloss in case of imbalanced classes? submitted by /u/returnname35 [link] [comments]  ( 9 min )
    [D] What do you put in your lab notes?
    I'm working my way through various tweaks to a ML pipeline, and I've started keeping short lab notes in a markdown file with just the time, a brief summary of changes, and my observations on training metrics or anything else interesting on a training run. I've also started copying a snapshot of the Python source code to the tensorboard directory, which has saved me a lot of headache. I was wondering how other people keep lab notes, and especially what you find useful to record and how you structure the notes. submitted by /u/hazard02 [link] [comments]  ( 9 min )
    [R] [D] Machine learning model to predict deformation of 2D object
    Hello, I am currently working on predicting 2D deformations of objects. These objects are available as 2D contours in my code. I am splitting these contours into 1000 points with an equal distance in the direction of the x axis. I have about 70 data entries. The following picture shows one of these objects: Comparison before and after The red data series contains the points before and the blue series contains the points after the deformation. My model should take in a series of coordinates before the deformation. Using this information the model should predict the coordinates after the deformation. I have tried using the LSTM Model from keras. Unfortunately I wasn't able to produce useful results. The way I structured my data is the following: [ [ [x1, y1], [x2, y2], [x3, y3], ... 1000 coordinate pairs ], [ [x1, y1], [x2, y2], ... ], .... 70 entries ] The structure for the input and the output series is the same. When trying to train the model I have a very low loss and low validation loss as well: Overview during training of model The test loss is also quite similar: Overview test loss However when looking into the predictions I get results like these: Visualized prediction after training The prediction is not close to what it should be like. Also the prediction seems to not change even when changing the input. ​ Do you have an idea about why my ML model does not work? Are there examples on this topic available? Should I change my approach in any way? Thank you in advance! Any help is appreciated! ​ If you need my jupyter-notebook, it would be great if somebody could tell me, how to link files on Reddit :) submitted by /u/InitiativeGlass4701 [link] [comments]  ( 10 min )
    [P] Equinox (1.3k stars), a JAX library for neural networks and sciML
    Hey folks! I wanted to advertise Equinox -- my now-surprisingly-popular ( :D ) JAX library for numerical models. These days that often means "neural networks", but I like to emphasise that this also includes ODEs/SDEs/linear solves, etc. Here's the GitHub link: https://github.com/patrick-kidger/equinox For those already using JAX, then Equinox is interesting because (a) it ships with a NN library, and (b) this is built around the idea that "everything is a pytree", which makes things easy to reason about and easy to compose. Furthermore (c) Equinox offers advanced tools like true runtime errors, out-of-place pytree surgery, and checkpointed while loops, and AFAIK in the JAX ecosystem these are unique to Equinox. For those most familiar with PyTorch: for many use cases (sciML in particular), JAX has a much stronger compiler, more advanced autodiff, etc. And whilst JAX itself is akin to the torch.* namespace, libraries like Equinox are then akin to the torch.nn.* namespace. Because of its speed and features, right now JAX+Equinox is my favourite approach to numerical computing. So I'd love for some more people to try it. What do you think? submitted by /u/patrickkidger [link] [comments]  ( 9 min )
    [D] Distributed training on a local cluster
    I want to make use of a local rack for running both training and serving jobs. I have looked into using something like Kubeflow, but I have some questions. -Does Kubeflow offer a suitable solution for running tasks across multiple machines? (Either data parallel or model parallel tasks). -How does resource provisioning work with it? Is it able to automatically select the machines that best suits the resource requirements or does it require the user to select where to run the job? Is it able to scale vertically/horizontally? Thanks in advance. submitted by /u/omegalul3000 [link] [comments]  ( 9 min )
    [P] Hydralette: Simple but powerful configs based on dataclasses
    Hi r/ML, i want to share a little side project of mine: hydralette. I mainly built this for my own work but thought why not get some feedback and potentially make someone else's work a little easier as well. I think we all agree that having a flexible configuration is crucial to successful ML experimentation. There are a million python config libraries out there, some dedicated to configs like hydra and others that support configs as a convenience feature like transformers.HfArgumentParser. So why did I decide to write yet another library? First off, I can say that I never really liked the way huggingface handles configs. All options are on a single level with tons of dependencies between them, some only taking effect if a combination of others is given. General approach to configs asid…  ( 11 min )
  • Open

    DSC Weekly 5 September 2023
    Announcements Top Stories In-Depth The post DSC Weekly 5 September 2023 appeared first on Data Science Central.  ( 20 min )
    16 most interesting AI applications across industries worldwide
    Artificial Intelligence has become a compulsive innovation for humankind, that we cannot live without. It has been gaining strength with every passing moment. The impact of AI applications extends beyond improved business results and can be significant in elevating and enriching the human experience. Popular AI trends in the past have revealed a compelling need… Read More »16 most interesting AI applications across industries worldwide The post 16 most interesting AI applications across industries worldwide appeared first on Data Science Central.  ( 20 min )
    Generative AI megatrends: Generative AI for enterprise is proven vs generative AI for  consumer is not – Part One
    As generative AI evolves, certain trends are becoming clearer,  In yet another milestone in AI consulting giant McKinsey unveiled its own generative AI tool for employees called lilli My comments a) McKinsey launching this agent gives credibility to the domain for enterprise AI assistants b) On one hand, it’s a familiar copilot strategy – but… Read More »Generative AI megatrends: Generative AI for enterprise is proven vs generative AI for  consumer is not – Part One The post Generative AI megatrends: Generative AI for enterprise is proven vs generative AI for  consumer is not – Part One appeared first on Data Science Central.  ( 19 min )
    Addressing the challenge of software support for multiarchitecture AI accelerated HPC
    Programmers can no longer rely on the traditional method of targeting specific hardware accelerators with conditional pragmas (e.g., #ifdef) to match the software to the hardware at a particular datacenter or customer site. Humans writing machine-specific code cannot address the exponential increase in possible hardware combinations in the modern multivendor, multiarchitecture computing environment. Open software provides a multiarchitecture, multivendor solution that addresses the complexities of accelerated HPC and AI computing. The post Addressing the challenge of software support for multiarchitecture AI accelerated HPC appeared first on Data Science Central.  ( 25 min )
    Generative AI megatrends: Generative AI for enterprise is proven  vs generative AI for  consumer is not – Part two
    In part one of this blog, we saw how there is an increasing case for an enterprise chatbot use case. In part two, we ask the question  Could a consumer chatbot i.e. directly customer facing chatbot be a flawed use case for an LLM? The consumer (customer facing) chatbot case is a familiar use case… Read More »Generative AI megatrends: Generative AI for enterprise is proven  vs generative AI for  consumer is not – Part two The post Generative AI megatrends: Generative AI for enterprise is proven  vs generative AI for  consumer is not – Part two appeared first on Data Science Central.  ( 19 min )
  • Open

    Build a generative AI-based content moderation solution on Amazon SageMaker JumpStart
    In this post, we introduce a novel method to perform content moderation on image data with multi-modal pre-training and a large language model (LLM). With multi-modal pre-training, we can directly query the image content based on a set of questions of interest and the model will be able to answer these questions. This enables users to chat with the image to confirm if it contains any inappropriate content that violates the organization’s policies. We use the powerful generating capability of LLMs to generate the final decision including safe/unsafe labels and category type. In addition, by designing a prompt, we can make an LLM generate the defined output format, such as JSON format. The designed prompt template allows the LLM to determine if the image violates the moderation policy, identify the category of violation, explain why, and provide the output in a structured JSON format.  ( 13 min )
    How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker
    In this post, we show how the Carrier and AWS teams applied ML to predict faults across large fleets of equipment using a single model. We first highlight how we use AWS Glue for highly parallel data processing. We then discuss how Amazon SageMaker helps us with feature engineering and building a scalable supervised deep learning model.  ( 10 min )
    Optimize deployment cost of Amazon SageMaker JumpStart foundation models with Amazon SageMaker asynchronous endpoints
    In this post, we target these situations and solve the problem of risking high costs by deploying large foundation models to Amazon SageMaker asynchronous endpoints from Amazon SageMaker JumpStart. This can help cut costs of the architecture, allowing the endpoint to run only when requests are in the queue and for a short time-to-live, while scaling down to zero when no requests are waiting to be serviced. This sounds great for a lot of use cases; however, an endpoint that has scaled down to zero will introduce a cold start time before being able to serve inferences.  ( 10 min )
  • Open

    Distal Adversarial Examples Against Neural Networks in PyTorch
    Out-of-distribution examples are images that are cearly irrelevant to the task at hand. Unfortunately, deep neural networks frequently assign random labels with high confidence to such examples. In this article, I want to discuss an adversarial way of computing high-confidence out-of-distribution examples, so-called distal adversarial examples, and how confidence-calibrated adversarial training handles them. The post Distal Adversarial Examples Against Neural Networks in PyTorch appeared first on David Stutz.  ( 5 min )
  • Open

    Rethinking trust in direct messages in the AI era
    Microsoft researchers are proposing a new way to ensure greater trust and accountability in email, texts, direct messages on social platforms, even phone calls, to help mitigate sophisticated threats from AI-related scams and fraud. The post Rethinking trust in direct messages in the AI era appeared first on Microsoft Research.  ( 14 min )
  • Open

    The Halo Effect: AI Deep Dives Into Coral Reef Conservation
    With coral reefs in rapid decline across the globe, researchers from the University of Hawaii at Mānoa have pioneered an AI-based surveying tool that monitors reef health from the sky. Using deep learning models and high-resolution satellite imagery powered by NVIDIA GPUs, the researchers have developed a new method for spotting and tracking coral reef Read article >  ( 6 min )
    A Perfect Pair: adidas and Covision Media Use AI, NVIDIA RTX to Create Photorealistic 3D Content
    Creating 3D scans of physical products can be time consuming. Businesses often use traditional methods, like photogrammetry-based apps and scanners, but these can take hours or even days. They also don’t always provide the 3D quality and level of detail needed to make models look realistic in all its applications. Italy-based startup Covision Media is Read article >  ( 7 min )

  • Open

    AI is a Looming Damnation
    submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 9 min )
    Natural Language Processing Question
    Hello, I am learning about natural language processing now. Technically, is this a way for a computer to input language of a person and then convert it into machine code (0s and 1s)? Or, is this a way to turn human language into some computer language like Python, and then turn into machine code as a second step? I am assuming that NLP has only just recently become widely used (like in Chat GPT). Was it a huge jump to go from a machine understanding a computer programming language like Python to a machine understanding ordinary human language? Why was it so much more difficult to train computers to understand the later? Thanks! submitted by /u/NoahsArkJP [link] [comments]  ( 9 min )
    Can someone tell me where I can get Runway Gen-2 code? I tried Github but found nothing
    Title submitted by /u/ICWiener6666 [link] [comments]  ( 9 min )
    Help with finding a tool for 3d image effects.
    Hello all - I'm looking to track down a tool that was able to create a zoom effect that looks three dimensional, example below. https://www.instagram.com/reel/CwLB1XsNK0X/?igshid=MmU2YjMzNjRlOQ== I've searched my usual spots for some different image editing tools and looked at some video ones as well, but I can't quite figure it out. Anyone familiar with a tool that could do something like that? Thanks in advance. submitted by /u/Lys0L [link] [comments]  ( 9 min )
    Is this company Legit? Any more info on the early access release of this AI?
    Sounds like the stuff I somgwrite about submitted by /u/Niu_Davinci [link] [comments]  ( 9 min )
    Can AI Writing Boost Your Mood and Mind? My Personal experience.
    Have you ever wondered if AI writing can make you feel better in your head? I would like to discuss about how AI writing can put a smile on your face as it is my personal experience. 1. Stress-Free Writing Writing can be stressful, especially when you're not sure where to start. AI writing tools can be your stress-busters. They help you begin by giving you ideas and suggestions. So, no more staring at a blank screen in frustration! 2. Beating the Writer's Blues We all know that feeling when words just won't flow. AI can be your brainstorm buddy. It tosses out ideas like confetti at a party, sparking your creativity when you need it most. Goodbye, writer's block! 3. Making Your Writing Shine Typos and messy sentences can be a downer. AI can be your proofreader, catching those pesky er…  ( 10 min )
    One-Minute Daily AI News 9/3/2023
    Amazon India is developing a generative artificial intelligence (AI) tool called SahAI (help/assist) for its business partners to help them with the backend of any particular product.[1] A robot moves a toy package of butter around a table in the Intelligent Robotics and Vision Lab at The University of Texas at Dallas. With every push, the robot is learning to recognize the object through a new system developed by a team of UT Dallas computer scientists.[2] “What is my purpose?” – “You pass butter”. Mustafa Suleyman, Google DeepMind’s co-founder and chief executive of Inflection AI, told the Financial Times that the US should use their chip leadership to enforce minimum global standards for the use of AI.[3] Model who never ages: Noonoouri becomes first digital artist to be signed by Warner Music.[4] Sources: [1] https://www.thehindu.com/sci-tech/technology/amazon-working-on-a-generative-ai-to-help-small-businesses-in-india/article67255325.ece [2] https://www.nanowerk.com/news2/robotics/newsid=63572.php [3] https://www.finextra.com/newsarticle/42878/google-deepmind-co-founder-argues-us-should-set-ai-global-standards---ft [4] https://www.thenationalnews.com/arts-culture/music-stage/2023/09/02/model-who-never-ages-noonoouri-becomes-first-digital-artist-to-be-signed-by-warner-music/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    An audiobook entirely created from A.I.
    This story, it's narrator and even the cover art are all made by Artificial Intelligence. The only human contribution was adding the Text to the book cover and the prompts used to produce the story. https://youtu.be/tZgq9N9RCo0 submitted by /u/BermaidMutter [link] [comments]  ( 9 min )
    AI-Generated Voice Deepfakes Pose New Threat to Bank Security
    Scammers are now using AI to create realistic voice deepfakes, aiming to trick people into transferring money. By mimicking real customer voices, this new type of voice fraud attempts to exploit bank security systems and deceive call center agents. To make sure you're updated about the latest AI trends, look here first. Increasing prevalence and sophistication of voice frauds A rise in AI-generated voice frauds has been noted this year, with one major case featuring an investor in Florida whose voice was synthetically duplicated to deceive his bank. Voice authentication vendor Nuance detected its first successful deepfake attack on a financial services client late last year. These scams are facilitated by the wide availability of voice samples online, coupled with the growth of AI capabilities and hackers' access to stolen bank account details. Defending against evolving AI threats Currently, only a small percentage of fraud calls to large financial companies are AI-generated. Most attacks have targeted credit card service call centers. Fraudsters are advancing their techniques, now able to convert speech to a specific target's voice in real-time using advanced AI systems like Microsoft's VALL-E. With most of these security measures focusing on call centers and automated systems, individual calls to high-ranking officials remain a vulnerability. (source) P.S. If you like this kind of analysis, I write a free newsletter that keeps you updated with the most relevant news and research in AI. Join professionals from Google, Meta, and OpenAI who are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    [D] What is the difference between self-taught learning and self-supervised learning?
    I came across a paper by Andrew Ng "https://ai.stanford.edu/~hllee/icml07-selftaughtlearning.pdf" ,with title " Self-taught Learning: Transfer Learning from Unlabeled Data " I am not an expert on this topic, but I feel it is really close to what SimCLR or MoCO are trying to do. Can someone provide guidance on what different it is between self-taught learning and self-supervised learning? submitted by /u/AaronSpalding [link] [comments]  ( 9 min )
    [R] A brain-inspired algorithm that mitigates catastrophic forgetting of artificial and spiking neural networks with low computational cost - Chinese Academy of Sciences 2023
    Paper: https://www.science.org/doi/10.1126/sciadv.adi2947#abstract Code: https://zenodo.org/record/8037309 Abstract: Neuromodulators in the brain act globally at many forms of synaptic plasticity, represented as metaplasticity, which is rarely considered by existing spiking (SNNs) and nonspiking artificial neural networks (ANNs). Here, we report an efficient brain-inspired computing algorithm for SNNs and ANNs, referred to here as neuromodulation-assisted credit assignment (NACA), which uses expectation signals to induce defined levels of neuromodulators to selective synapses, whereby the long-term synaptic potentiation and depression are modified in a nonlinear manner depending on the neuromodulator level. The NACA algorithm achieved high recognition accuracy with substantially reduced computational cost in learning spatial and temporal classification tasks. Notably, NACA was also verified as efficient for learning five different class continuous learning tasks with varying degrees of complexity, exhibiting a markedly mitigated catastrophic forgetting at low computational cost. Mapping synaptic weight changes showed that these benefits could be explained by the sparse and targeted synaptic modifications attributed to expectation-based global neuromodulation. https://preview.redd.it/5lcx3sn8ramb1.jpg?width=711&format=pjpg&auto=webp&s=4431b81708bb9ab98e6351f4b979897ad8244ed9 https://preview.redd.it/vgsuqsn8ramb1.jpg?width=718&format=pjpg&auto=webp&s=f0602185fcb0dc6ec29308f77f1db77a4f4a562d https://preview.redd.it/hpfuftn8ramb1.jpg?width=709&format=pjpg&auto=webp&s=545f4fab3033cb68637052e7ff2c4775a12a7b99 https://preview.redd.it/7plm0tn8ramb1.jpg?width=714&format=pjpg&auto=webp&s=b138b1c43a2078297b69c09d26de013865629e77 https://preview.redd.it/uc6tnrn8ramb1.jpg?width=703&format=pjpg&auto=webp&s=fcf747b2515fbf6e78b1ef7aa66ce9ca4d223cd3 submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Faster, long range transformer [R]
    Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length. In this video, we walk through the ColT5 paper and explain what is T5, longT5, UL2 and PEGASUS, then discuss how ColT5 has advantage over previous methods for few-shot and 1-shot tasks. https://youtu.be/8KCQQtXje2g?si=ecbvnFPlhGP01aOt submitted by /u/MRMohebian [link] [comments]  ( 9 min )
    [R] YaRN: Efficient Context Window Extension of Large Language Models - Nous Research 2023 - Open source allows context windows of up to 128k!
    Paper: https://arxiv.org/abs/2309.00071 Github: https://github.com/jquesnelle/yarn Very informative Reddit discussion: https://www.reddit.com/r/LocalLLaMA/comments/166jik4/128k_context_llama_2_finetunes_using_yarn/?utm_source=share&utm_medium=web2x&context=3 Twitter: https://twitter.com/EnricoShippole/status/1697317625116742119?s=20 Abstract: Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we show that LLaMA models can effectively utilize and extrapolate to context lengths much longer than their original pre-training would allow, while also surpassing previous the state-of-the-art at context window extension. In addition, we demonstrate that YaRN exhibits the capability to extrapolate beyond the limited context of a fine-tuning dataset. We publish the checkpoints of Llama 2 7B/13B fine-tuned using YaRN with 64k and 128k context windows at https://github.com/jquesnelle/yarn . https://preview.redd.it/tnovsbpjiamb1.jpg?width=1354&format=pjpg&auto=webp&s=ce098b3071285f9f64d99312a98999de8b625bfe https://preview.redd.it/j10sicpjiamb1.jpg?width=997&format=pjpg&auto=webp&s=95bbc6d70759ef7ccdf6bccee0c2a2f98ebda52b https://preview.redd.it/ve710dpjiamb1.jpg?width=1380&format=pjpg&auto=webp&s=05f53117bcf648e330fa6ac148746484dca9fb1b ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] DenseDiffusion: The Game-changing, Training-free Technique in Text-to-Image Generation
    Overcoming present challenges in text-to-image models, DenseDiffusion is the latest advancement ensuring enhanced image quality based on scene descriptions. Developed specifically to handle complex captions, it brings a new era in dense captioning. https://preview.redd.it/v5oa5suwfamb1.png?width=2000&format=png&auto=webp&s=17fbcc702ee21a41cb356a7d0e38d710a8c048c3 If you want to stay on top of the latest trends and insights in AI, look here first. Why is it noteworthy? It addresses the issues with existing techniques where users face inconsistencies when dictating the arrangement of elements within generated images using textual prompts. DenseDiffusion is training-free, unlike existing methods like "Make-aScene" and "Latent Diffusion Models," which are computationally intensive and r…  ( 9 min )
    [D] video data in image classification
    Let's say your training a simple CNN for a classification problem. An example would be a model that is supposed to decide if a person is male or female based on facial images. What is your experience regarding image sequences from videos in the training datasets? My intuition is, that the added information to the dataset from one video isn't proportional to the number of frames. The network probably can't learn much more from 30 frames with little variation in comparison to a single image (at least if you use augmentations). What do you think about this? Or do you even know any research in the direction of this question? submitted by /u/seba07 [link] [comments]  ( 9 min )
    [D] Strongest LLM for Writers/Editors
    Hey all, I'm a screenwriter that's curious about ML/AI tech and its applications to my industry. I'm wondering what the current best product is for writers and editors. Specifically, I'm curious if there's a product that can "edit" longform text - say, to trim a screenplay down from 140 pages to 120, while retaining style, plot, and narrative intent. Are there any products like that? Forgive me if this is too basic; I've only dabbled in ChatGPT and MidJourney to see what the fuss is about. Thanks in advance! submitted by /u/cesrep [link] [comments]  ( 9 min )
    [P] We're building the first LLM marketplace to connect developers with teams, investors, and projects
    There is so much going on right now in AI and machine learning. But there isn't a concise place to find experts, teams, and amazing projects all in one place. That is why we are building Bazaar, the first ever LLM marketplace. We will be inviting slowly making sure we have enough members on each side of the marketplace. https://www.llmbazaar.com/ submitted by /u/husky_misconception [link] [comments]  ( 9 min )
    [Project] Should i use the compile() function when using a custom trainer class in tensorflow?
    I'm writing a neural network for super resolution but it's one of my first projects and I didn't really understand what compile() is used for. I specify the optimizer, the loss and the accuracy metrics in the trainer class and then I just call my train method on my model. Should I still use the compile function? I'm following this template for the project structure https://github.com/jinh0park/Tensorflow-2.0-Project-Template/tree/master submitted by /u/petrogass [link] [comments]  ( 9 min )
    [R] Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis
    Project page: https://pmh9960.github.io/research/GCDP/ https://i.redd.it/9uz2wt3ba9mb1.gif submitted by /u/yeolj0o [link] [comments]  ( 9 min )
    [Discussion] Segmentation Suggestions for Structured (and Deeply Nested) Bulleted Documents
    My question is more focused on the pre-processing side, rather than the training side of things. I have a local RAG Q&A pipeline set up for personal documents (local regulations, technical manuals, stuff like that), and I'm looking for ways to improve it. All the documents I'm working with are consistently structured, with nested bullets of varying depth making up most of the structure. So far I've been manually writing/tweaking a python script I wrote to recursively extract the nested bullets and duplicate their hierarchy parents' content for each of the inner-most bullets, that way each bullet has all the contextual content it needs to be valuable in a vacuum. So something like: (a) 1. A. B. Would turn into: (a) + 1. + A. (a) + 1. + B. This works well in the sense that my LLM does a wonderful job answering my questions and citing the right sources, but the lion's share of my work goes into the tweaking of my parser scripts, or creating new ones entirely. I've played around with semantic segmentation via embedding models, but it doesn't really work here since I'm trying to retain the nested structure of the document for citation accuracy. Does anyone have any ideas for ots solutions that address this kind of thing? I can't be the only person who has run into this type of problem, but I've been having a really hard time finding relevant libraries/software that can even get me 80% of the way there. Also, I'm totally happy to hear what you guys have done and how's it's worked out/what walls you've hit! Edit: I suppose I should have included my current attempt as well, so it's doesn't look like I'm treating this subreddit like Google lol https://gist.github.com/apettina/76de292d6d24ed3d0128b87847706b18 submitted by /u/RedditAppSucksDicks [link] [comments]  ( 10 min )
    [Discussion] How to implement Data Contracts generically? Seeking advice from data contract users.
    Hey folks, it's me the dlt builder again. I have questions about data contracts! Schema evolution, where the schema of the destination evolves based on incoming data is nice for ingesting transactional data. However, there are scenarios where we might not want this automatic evolution. For example, when other parts of our infrastructure require a fixed schema or when we want to store only data that conforms to the current schema. This is where a data contract comes into play. Our plan is to implement a straightforward version of this concept initially. We're considering introducing settings on the pipeline to control schema evolution, and here are some modes we're thinking about: Evolve (Default): The current behavior where the schema adapts to incoming data. Freeze-and-Trim: Freez…  ( 10 min )
    [P] 🤵🔥 Classy-Fire 🔥🤵 - pretrained text classification using LLM APIs (github.com/microsoft)
    Classy-fire is a pretrained multiclass text classification approach that leverages Azure OpenAI's LLM APIs using clever parameter tuning and prompting for classification. Why? Tired of having to beg your LLM to pick from a set of options / actions? Tired of working hard on cleaning and parsing its responses to trigger a flow? Struggling to strip unhelpful prefixes (such as "Sure! " or "I am just a language model!")? Having to wait on retries in cases of unexpected outputs? Getting random responses on the same query? Need a "quick and dirty" text classifier? Don't have enough training data? submitted by /u/shayben [link] [comments]  ( 9 min )
    [D] Are there any projects working with large compute clusters looking for volunteers?
    I've been an ML/software engineer for a bit over 7 years now, and am looking for a new job. It seems like most of the job postings I see around want experience with large compute clusters, but my work has always been in compute-restricted domains (robotics, on-prem deployments, etc.). I'm looking broaden my skillset and get some experience with distributed computing. Does anyone know of open-source or otherwise public projects that work with compute clusters like this that are looking for volunteers? I'm happy to put aside an hour or so a day to work on an interesting project. submitted by /u/Flag_Red [link] [comments]  ( 9 min )
    [D] - Two objections to Iris van Rooij's paper saying that it is provably intractable to simulate human intelligence via any machine learning algorithm that samples from human actions.
    https://psyarxiv.com/4cbuv/ The short of the paper is they show that an AI algorithm that can only learn via sampling from human action is unable to tractably simulate human behavior. I have seen papers like this one by u/alcanthro questioning the validity of the result, but I want to point out two objections to the paper that stand even if the result is true. ​ 1 - It only seems to apply for AIs trained to mimic humans via sampling human behavior: The paper assumes the AI is trained via an arbitrary machine learning algo M that samples from possible human behaviors in given situations. This matches pretty well to how a lot of LLMs are pretrained (guess the next token), but doesn't seem to apply to any sort of reinforcement learning, since in those situations you are not training the …  ( 11 min )
    [P] ReAct: "Recurrence for Adaptive Computation" can lead to OOD length-extrapolation
    This was a small project I was working upon which adds a recurrent prior to attention-based models. This allows integrating an adaptive-computation mechanism, leading to much better length-extrapolation capabilities (compared to vanilla transformers). On some tasks, I'm able to OOD extrapolate to quite an appreciable extent! Its also (relatively) quite parallelizable with slightly different training regimes - thus, hopefully being scalable as well. Being lightweight, it might be useful for inferencing as it saves on memory (trading off compute instead). It's interesting to think that MHSA might contain an implicit inductive bias that prevents extrapolation. Replacing that with other variants helps a lot - I go in detail in the writeup! Twitter summary: https://twitter.com/awesome_ruler_/status/1698668965612917112?s=20 Writeup/Blogpost: https://dripfeedofideas.notion.site/dripfeedofideas/ReAct-bef052956a0d45f29fb5a5383e7d737d GitHub repo: https://github.com/neel04/ReAct submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 9 min )
    [D] Current opinions on the information bottleneck principle for neural networks?
    A while back, the IB principle (https://arxiv.org/abs/1503.02406) made a few waves as a promising framework to understand/study deep neural networks. But I recall a series of follow up works (notably https://openreview.net/forum?id=ry_WPG-A-) that called a lot of the results into question, and (I think?) people drifted away from it. I saw this recent paper (https://arxiv.org/abs/2304.09355) on the IB and self-supervised learning, and it got me wondering what the current views are as to how useful/accurate the IB view of deep learning is? submitted by /u/Tea_Pearce [link] [comments]  ( 9 min )
    [Project][Discussion] What could I use to create a UI like AIChain?
    I was looking over this paper https://arxiv.org/abs/2110.01691 called AIChains that deals with an interactive chaining method to interact with LLMs. I could not find an associated codebase with that paper. If I wanted to create a similar UI like theirs anything you would recommend? More specifically, if I want to replicate the paper in 3 months full time (as a student with some experience in ML), what would be the best approach to the UI part of the paper. What if deployment is a concern? I was intially thinking of simple python frameworks like PySimpleGui, or maybe something more comprehensive like PyQt. I am rather unfamiliar with more common web frontend frameworks, but if there are suggestions that make such a Graph/Diagram based User interface easy to implement, I am open to them. submitted by /u/BasisCompetitive6275 [link] [comments]  ( 9 min )
    [D] how to learn Stochastic Differential Equations for diffusion model?
    There have many blogs and papers disscuss SDE for diffusion model: Stochastic Differential Equations and Diffusion Models https://www.vanillabug.com/posts/sde/ Perspectives on diffusion https://sander.ai/2023/07/20/perspectives.html On the Mathematics of Diffusion Models https://arxiv.org/abs/2301.11108 But i can't find blog or book to explain Stochastic Differential Equations, it seems complex, even after i have learned Calculus and Ordinary Differential Equations and Partial Differential Equations, i still can't understand SDE, especially the SDE Perspective on diffusion. So Do you know some blogs or books explain SDE intuitive like betterexplained.com/ and mathsisfun.com/ ? submitted by /u/ghosthamlet [link] [comments]  ( 9 min )
    [D] How are Mixture Of Expert models trained in conjunction with Transformers?
    How MoEs (sparsely gated ones) are trained appear to be rather opaque from looking at literature (e.g. GLAM and similar papers). From my intuition it would make sense that it works either by: Each expert being trained on a subset of data (the data they are supposed to have expertise in) to predict a token given a previous token, or to predict a token given a contextual embedding. This would mean the expert MLPs are frozen, and the only thing concerning the experts that we train with the transformer is the gating mechanism. or The experts are trained in the same training loop as the transformer (e.g. backprop over the whole network), but that each of the experts are only trained on a subset of the data corresponding to their expertise (e.g. as we perform the training loop and we run upon data from our math dataset, then we backprop through the math expert mlp) ​ Could anyone help me resolve my confusion and point me in the right direction for how these are trained? Thanks! submitted by /u/SorasNobody [link] [comments]  ( 9 min )
    [D] (Advice) Remote work in ML/DL/Data Science
    I'm from India and I've started learning and building my portfolio in Machine Learning/Deep learning. Currently, I'm doing "Practical Deep learning using fast.ai and pytorch" course. In my university, there are not a lot of companies visiting for campus placement this year so I decided to go on the offcampus job hunt. I'm a final year student (in masters but bachelors was unrelated to CS) and no work experience. I have further personal goals for which I'd need a steady and good income. I decided if I could get a remote job it would be really beneficial for me as my living costs would be saved and I'll be paid much more than what India offers freshers(since I'll be paid in dollars or euros). However, I need advice in various domains: 1. Should I focus on one of ML/DL/ Data science or multiple ? 2. Any resources that could help me learn ? 3. Projects that help me stand out from the crowd? 4. Where can I start looking for remote work(websites, etc)? 5. Any other personal advice is appreciated! Thank you for taking the time to read my post :) submitted by /u/Lazy_Guidance_5151 [link] [comments]  ( 9 min )
    [D] Finetune pretrained ViT
    Hello everyone, In deep learning finetuning pre-trained model was performed by taking some pretrained models like resnet, vgg and unfreezing some of it's final layers. Is it the same when finetuning pretrained ViT models? Or do we have to take pretrained ViT and train all the parameters on our own data ? On this tutorials https://theaisummer.com/hugging-face-vit/, they have not freezed any pretrained layers. submitted by /u/Bishwa12 [link] [comments]  ( 9 min )
  • Open

    NN underperforming greedy algorithms
    So apparently NNs may not outperform simple greedy algorithms in some combinatorial optimization problems Never thought could be the case. https://arxiv.org/pdf/2206.13211.pdf… Modern graph neural networks do worse than classical greedy algorithms in solving combinatorial optimization problems like Maximum Independent Set. https://arxiv.org/pdf/2210.00623.pdf… Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-Cut submitted by /u/vniversvs_ [link] [comments]  ( 9 min )
    Introducing Refact Code LLM: 1.6B State-of-the-Art LLM for Code that Reaches 32% HumanEval
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Predicting Optimal Temperature in The Transmission System using ML
    submitted by /u/Antique-human6894 [link] [comments]  ( 9 min )
  • Open

    "ChessGPT: Bridging Policy Learning and Language Modeling", Feng et al 2023
    submitted by /u/gwern [link] [comments]  ( 9 min )
    Looking for open PhD positions
    Hi all, I have just completed my MSc and am looking for open PhD positions (preferably funded) in RL to join. My masters thesis was on Hierarchical RL and skill discovery, so that’s the domain am mostly interested in since I have spent quite some time researching it but also open to other interesting avenues. If there are any such positions available at your workplace/lab please let me know. Thanks….. submitted by /u/FreakedoutNeurotic98 [link] [comments]  ( 9 min )
    The reason for using a policy based learning method
    I am reading Sutton's RL chapter on Policy Gradients (13.1) and came across the following paragraph. Can someone please explain it to me - " Finally, we note that the choice of policy parameterization is sometimes a good way of injecting prior knowledge about the desired form of the policy into the reinforcement learning system. This is often the most important reason for using a policy-based learning method. ". Is he referring to some kind of Bayesian technique? I'd highly appreciate some examples here. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    Continuation of Key Papers in DRL from OpenAI Spinning UP
    Hey, I've been going through the papers curated by people behind OpenAI Spinning Up and I've recently started thinking what the list would look like in 2023 if OpenAI hadn't abandoned it. Do you folks have any suggestions for DRL papers from 2019, 2020, …, up to now? submitted by /u/spoiled-mylk [link] [comments]  ( 9 min )
    Reinforcement learning Rivals of Aether
    i want to create an ai for Rivals of Aether to see how far it could get in abyss mode and if it could beat 3 9th level cpus on a team. i have no idea how to do this. I was thiking for abyss mode, it could get rewards for finishing waves, and get more reward for doing them with minimal damage. ​ submitted by /u/Additional_Ad9093 [link] [comments]  ( 9 min )
  • Open

    Jordan normal form: 1’s above or below diagonal?
    Given a square complex matrix A, the Jordan normal form of A is a matrix J such that and J has a particular form. The eigenvalues of A are along the diagonal of J, and the elements above the diagonal are 0s or 1s. There’s a particular pattern to the 1s, giving the matrix J […] Jordan normal form: 1’s above or below diagonal? first appeared on John D. Cook.  ( 6 min )
    Eigenvectors of the DFT matrix
    When is the discrete Fourier transform of a vector proportional to the original vector? And when that happens, what is the proportionality constant? In more formal language, what can we say about the eigenvectors and eigenvalues of the DFT matrix? Setup I mentioned in the previous post that Mathematica’s default convention for defining the DFT […] Eigenvectors of the DFT matrix first appeared on John D. Cook.  ( 6 min )
  • Open

    NVIDIA CEO Meets with India Prime Minister Narendra Modi
    Underscoring NVIDIA’s growing relationship with the global technology superpower, Indian Prime Minister Narendra Modi met with NVIDIA founder and CEO Jensen Huang Monday evening. The meeting at 7 Lok Kalyan Marg — as the Prime Minister’s official residence in New Delhi is known — comes as Modi prepares to host a gathering of leaders from Read article >  ( 5 min )

  • Open

    [R] How susceptible are LLMs to Logical Fallacies?
    paper https://arxiv.org/abs/2308.09853 abstract. This paper investigates the rational thinking capability of Large Language Models (LLMs) in multi-round argumentative debates by exploring the impact of fallacious arguments on their logical reasoning performance. More specifically, we present Logic Competence Measurement Benchmark (LOGICOM), a diagnostic benchmark to assess the robustness of LLMs against logical fallacies. LOGICOM involves two agents: a persuader and a debater engaging in a multi-round debate on a controversial topic, where the persuader tries to convince the debater of the correctness of its claim. First, LOGICOM assesses the potential of LLMs to change their opinions through reasoning. Then, it evaluates the debater’s performance in logical reasoning by contrasting the scenario where the persuader employs logical fallacies against one where logical reasoning is used. We use this benchmark to evaluate the performance of GPT-3.5 and GPT-4 using a dataset containing controversial topics, claims, and reasons supporting them. Our findings indicate that both GPT-3.5 and GPT-4 can adjust their opinion through reasoning. However, when presented with logical fallacies, GPT-3.5 and GPT-4 are erroneously convinced 41% and 69% more often, respectively, compared to when logical reasoning is used. Finally, we introduce a new dataset containing over 5k pairs of logical vs. fallacious arguments. The source code and dataset of this work are made publicly available. GPT3.5 vulnerable to false information generated by itself! submitted by /u/Amir-AI [link] [comments]  ( 9 min )
    [Discussion] What was your biggest oops with a model or analysis that made it (or almost made it) into production?
    I'm asking because it seems like when I review other people's work I very regularly catch a tiny coding misstep that has HUGE downstream implications. I'm sure my own work is not exempt either. Some examples: "At this step you're saying you encode responders as 1 and non-responders as 0 but you actually did it the other way around." "That groupby statement isn't doing what you think it's doing." "When you created your target variable by labeling people with this ratio >= 30%, you accidentally failed to capture a ton of actual responders, because the floating-point arithmetic used to derive this column is calculating people with actual values of 0.30 as 0.2999999999999998." Come on guys, let's hear it. submitted by /u/WartimeHotTot [link] [comments]  ( 9 min )
    [R] Meta's DINOv2 and FACET sets the bar in computer vision model fairness
    Meta has recently unveiled DINOv2, its cutting-edge computer vision model, and FACET, a comprehensive benchmark to ensure AI fairness. These developments promise improved automation and better inclusivity in the AI sector. If you want to stay on top of the latest trends and insights in AI, look here first. https://i.redd.it/jeojm1qew3mb1.gif DINOv2 for advanced visual tasks Meta has made the powerful DINOv2 model available under the Apache 2.0 license, employing self-supervised learning to enhance image segmentation and depth estimation. This broader use model encourages further innovation and practical application in the computer vision community, driving progress in the AI industry. FACET for enhanced AI fairness Given the inherent difficulty and risks in ensuring fairness in computer vision, Meta introduced FACET. FACET has been developed to benchmark fairness across computer vision models performing tasks such as detection or classification, considering a wide array of demographic attributes. This revolutionary tool enables a better understanding of potential biases in AI models, helping to address fairness and robustness concerns. Wider implications Preliminary studies indicate performance disparities across some demographic groups within computer vision models. FACET allows researchers to track these divergences and monitor the implementation of corrective measures. Meta actively encourages researchers to use FACET for fairness benchmarking in other visual/multimodal tasks. For instance, the DINOv2 model's performance was analyzed with FACET — facilitating insights into potential biases. (source) P.S. If you like such analysis, I write a free newsletter tracking significant news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [P][D] VAE using one-hot-encoding input, problem with optimising and getting good results
    Introduction to the problem I will provide you with link to github code so you can see both input dataset, how my onehot encoding and decoding works and implementation of VAE, with current reconstruction_loss and kl_loss fuctions that make my loss function. dataset - amp.csv onehotencoding -- tools.py main code -- VAE-onehot-testing.ipynb https://github.com/aronta/Master-thesis-Generating-de-novo-peptides-using-variational-autoencoder-model/blob/main/VAE-onehot-testing.ipynb Firstly, just to explain dataset. I have sequences of peptides represented with onehot encoded sequences which I am using as input for my model. Current VAE model, both encoder and deocder are based on LSTM layer as a main way for this VAE to learn connections between inputed sequences and to make sense of it all. Main Issue Issue is that the latent space im getting doesn't look good no matter what i do. (pictures of plots are on link). So i have tried scalling kl loss (and also warming it up -- because many papers say its a good way) but it doesn't change the end result. Maybe there is problem in the implementation of the VAE, i am realy not sure. The main goal would be (like in all VAE implementations) generating new sequences from latent spaces that make sense, opposed to the current outputs that im getting. My guess There is a problem with optimizing loss function, but i could be completely wrong (maybe the model is wrong for the input I have, or onehotencoding isn't even a good way to represent data entering LSTM layer). submitted by /u/Yupgrade [link] [comments]  ( 9 min )
    "[discussion]" Pleae help. Started with ML, MERN and Contributed but conflicted what to continue on..
    I've been learning MERN with a course (charging INR 6k) for last 3 weeks, won two hackathons and Contributed to OS projects. Now just 1 week ago I saw a remote ML job Profile that required OS Contribution to apply.I looked through their docs, learnt python, tensor flow basics and I Contributed to their Tensor flow and Paddle module and got 3 - 4 PR merged(Enough to apply) . Now I'm confused what to continue with, should I do both or do it one by One. I'm a recent graduate so need a job ASAP but I can give maximum time of the day to study. Please can someone give some advice so I can make my decision, as I'm unable to leave either TLDR; Learnt both ML and MERN, Contributed and now confused what to carry on with as I need a job asap. submitted by /u/Sinofdracry [link] [comments]  ( 9 min )
    [D] Compute percentage of languages present in a document
    Hi guys, I'm trying to compute the percentage of each language appearing in a document. My current use cases including two known languages and a set of documents which have the two languages mixed in (code switching, due to translation error). I'm training an ML model to make the output monolingual (leaned towards a designated language), so I need a reliable measure to estimate whether the ML model is making progress or not. Currently, I use lingua with the `compute_language_confidence_values()` function but the prediction is quite poor. For example, given a piece of text in Japanese and English: from lingua import Language, LanguageDetectorBuilder languages = [Language.ENGLISH, Language.JAPANESE] detector = LanguageDetectorBuilder.from_languages(*languages).build() detector.compute_language_confidence_values("わかりません hey do you understand me hey oh really") >>> [ConfidenceValue(language=Language.ENGLISH, value=1.0), ConfidenceValue(language=Language.JAPANESE, value=0.0)] So it's not quite correct (should be 0.8-0.2 or something similar), does anyone have any advice ? Or are there better softwares out there ? submitted by /u/KarmaCut132 [link] [comments]  ( 9 min )
    [D] does anyone have any papers on getting LLMs to output perfect formats?
    Does anyone have any literature on how to constrain the output of an LLM to a specified format? I’ve self hacked a method to get LLAMA to output a json of perfect schema. I tried to find something out of the box but I couldn’t find anything, and so I home brewed it. Thinking of publishing a paper on this but I don’t want to republish something already written, so asking here first. Thanks! submitted by /u/SnooPears7079 [link] [comments]  ( 9 min )
    [D] does anyone have any papers on getting LLMs to output perfect formats?
    Does anyone have any literature on how to constrain the output of an LLM to a specified format? I’ve self hacked a method to get LLAMA to output a json of perfect schema. I tried to find something out of the box but I couldn’t find anything, and so I home brewed it. Thinking of publishing a paper on this but I don’t want to republish something already written, so asking here first. Thanks! submitted by /u/SnooPears7079 [link] [comments]  ( 9 min )
    [R] Requesting help finding labs/ professors on certain discipline.
    submitted by /u/Present-Ad-8531 [link] [comments]  ( 9 min )
    [P] I built a Chrome extension that adds a chatbot to every GitHub repository
    submitted by /u/jsonathan [link] [comments]  ( 9 min )
    [Discussion] How to setup TPU parallelism/FSDP with HuggingFace Transformers
    My Code (Colab Link) Hi! For the past few days, I've been trying to fine-tune a model using TPU parallelism / FSDP with a Kaggle TPU notebook. The reason I need to set up FSDP is because the model I'm using is very large (Openlm's open llama 3b v2). When I try to fine-tune it, I quickly run out of memory on the TPU. Linked above is my code, if anyone has any useful information I would greatly appreciate it! Thank you!! Edit: Also providing my code through text here: !pip install sentencepiece !pip install -U accelerate !pip install -U transformers !pip install cloud-tpu-client !pip install torch-xla !pip install pyarrow import torch import torch_xla import torch_xla.core.xla_model as xm from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments import pand…  ( 10 min )
    [D] Various models and git repo examples to learn Algo Trading
    Can someone list down one dataset for Algo Trading Simulation or free API endpoint , and i will go over following Algorithms: Basic KNN (moving average) SVR other ML models if any LSTM other DL models if any any RNN model basically my mission is to write a paper at then end of months comparing all algorithms with candle stick patterns over different strategies ​ submitted by /u/reactwebdev [link] [comments]  ( 9 min )
    [D][R] How often do Masters students doing a thesis publish in top ML conferences in their program period of 2 years ?
    Just curious to know the thoughts of other Masters/PhD students, professors or others in academia or industry research about their experience with regard to the title. submitted by /u/V1bicycle [link] [comments]  ( 9 min )
    I pretrained 16 language models from scratch with different tokenizers to benchmark the difference. Here are the results. [Research]
    I'm the author of TokenMonster, a free open-source tokenizer and vocabulary builder. I've posted on here a few times as the project has evolved, and each time I'm asked "have you tested it on a language model?". Well here it is. I spent $8,000 from my own pocket, and 2 months, pretraining from scratch, finetuning and evaluating 16 language models. 12 small sized models of 91 - 124M parameters, and 4 medium sized models of 354M parameters. Here is the link to the full analysis. Summary of Findings Comparable (50256-strict-nocapcode) TokenMonster vocabularies perform better than both GPT-2 Tokenizer and tiktoken p50k_base on all metrics. Optimal vocabulary size is 32,000. Simpler vocabularies converge faster but do not necessarily produce better results when converged. Higher compre…  ( 10 min )
    [P] Comgra: A library for debugging and understanding neural networks
    I'm a machine learning engineer and researcher. I got fed up with how difficult it is to understand why neural networks behave the way they do, so i wrote a library to help with it. Comgra (computation graph analysis) is a library you can use with pytorch to extract all the tensor data you care about and visualize it graphically in a browser. This allows for a much more detailed analysis of what is happening than the usual approach of using tensorboard. You can go investigate tensors as training proceeds, drill down into individual neurons, inspect single data sets that are of special interest to you, track gradients, compare statistics between different training runs, and more. This tool has saved me a ton of time in my research by letting me check my hypotheses much more quickly than normal and by helping me understand how the different parts of my network really interact. I hope this tool can save other people just as much time as it did me. I'm also open for suggestions on how to improve it further: Since I'm already gathering and visualizing a lot of network information, adding more automated analysis would not be much extra work. submitted by /u/Smart-Emu5581 [link] [comments]  ( 9 min )
    [D] Reinforced Self-Training (ReST) for Language Modeling (Video Paper Discussion)
    https://youtu.be/V4dO2pyYGgs ReST uses a bootsrap-like method to produce its own extended dataset and trains on ever higher-quality subsets of it to improve its own reward. The method allows for re-using the same generated data multiple times and thus has an efficiency advantage with respect to Online RL techniques like PPO. ​ Paper: https://arxiv.org/abs/2308.08998 ​ Abstract: Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating samples from the policy, which are then used to improve the LLM policy using offline RL algorithms. ReST is more efficient than typical online RLHF methods because the training dataset is produced offline, which allows data reuse. While ReST is a general approach applicable to all generative learning settings, we focus on its application to machine translation. Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner. ​ Authors: Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, Wolfgang Macherey, Arnaud Doucet, Orhan Firat, Nando de Freitas submitted by /u/ykilcher [link] [comments]  ( 9 min )
    MAML convergence with GAN [D]
    I had been exploring convergence properties of MAML and there are some recent works establishing convergence under certain conditions. I am trying to understand how this would play out with GAN's, I know that in general training is generally unstable and there are a lot of issues such as memorization and mode collapse under this regime, but I am looking for a theoretical result, for instance we know that GAN's converge under ideal conditions and we also know that MAML converges, can we make any comments on the convergence properties of GAN's when trained using MAML, ideally a neat trick to know if they will converge based on what we already know? The proof for MAML convergence is fairly complicated and I expect that a proof that has additional second order gradient terms and feedback loops will probably involve a lot of work and I am wondering if anyone could provide some sort of insight or intuition as to what such a result would look like? Thanks submitted by /u/ashblue21 [link] [comments]  ( 9 min )
    [D] Linear regression for time series data
    Problem: Given time series data of the last few years, one data point per day (eg price of a product or sales made this day). My job is to predict the next 7 days, ie. 7 scalars. Approach: Train one model for each time lag. The first model predicts tomorrow, the second model the day after tomorrow and so on (7 models in total). The features are the last prices of the last 7 days and some saisonal features (calendar week, price on this day last year and so on). Question: is there anything wrong with this approach? It doesn’t feel like the most elegant method to train 7 separate models. The problem with using a single model is, that this model must be able to predict 7 values of different points in time (and i don’t want to give the model input data of 7 days and let it predict all 7 scalars at once. The model should only use the features of a single day to predict this day). The 2 other to options I have considered are to train an autoregressive model (model just learns to predict the next day. To predict the day after tomorrow you give it its own prediction as input). Or to build a „time-lag“ feature, which tells the model how far in the future this datapoint lies. But this doesn’t make sense, because there is nothing like a weekly trend or so. What do you think? The autoregressive approach is elegant, but its implementation and maintenance is complex. submitted by /u/Individual-Cause-616 [link] [comments]  ( 9 min )
    [P] [D] Data augmentation using Stable diffusion
    I've written a post on how to use stable diffusion for data augmentation for object detection and segmentation. Please check it out and share some insights on how to evaluate these kind of tasks. https://medium.com/@kaushik.koneripalli/satellite-image-data-augmentation-using-stable-diffusion-for-object-detection-segmentation-8b1fe87b969 submitted by /u/perceptron333 [link] [comments]  ( 9 min )
    [P] Open-source star removal tool using Pix2Pix
    I created a open-source star removal tool "star2k13". Would love to hear some feedback . Here is link to the tool : Starrem2k13: Open source star removal tool (code2k13.github.io) Works on most operating systems and docker submitted by /u/Key_Education_2557 [link] [comments]  ( 9 min )
    [P] Coding LLaMA 2 from scratch in PyTorch, with step by step explanation of KV Cache, Grouped Query Attention, Rotary Positional Embedding, RMS Normalization, SwiGLU and much more!
    submitted by /u/hkproj_ [link] [comments]  ( 9 min )
  • Open

    Meta's DINOv2 and FACET sets the bar in computer vision model fairness
    Meta has recently unveiled DINOv2, its cutting-edge computer vision model, and FACET, a comprehensive benchmark to ensure AI fairness. These developments promise improved automation and better inclusivity in the AI sector. If you want to stay on top of the latest trends and insights in AI, look here first. https://i.redd.it/zg47br3xv3mb1.gif DINOv2 for advanced visual tasks Meta has made the powerful DINOv2 model available under the Apache 2.0 license, employing self-supervised learning to enhance image segmentation and depth estimation. This broader use model encourages further innovation and practical application in the computer vision community, driving progress in the AI industry. FACET for enhanced AI fairness Given the inherent difficulty and risks in ensuring fairness in computer vision, Meta introduced FACET. FACET has been developed to benchmark fairness across computer vision models performing tasks such as detection or classification, considering a wide array of demographic attributes. This revolutionary tool enables a better understanding of potential biases in AI models, helping to address fairness and robustness concerns. Wider implications Preliminary studies indicate performance disparities across some demographic groups within computer vision models. FACET allows researchers to track these divergences and monitor the implementation of corrective measures. Meta actively encourages researchers to use FACET for fairness benchmarking in other visual/multimodal tasks. For instance, the DINOv2 model's performance was analyzed with FACET — facilitating insights into potential biases. (source) P.S. If you like such analysis, I write a free newsletter tracking significant news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    'Fluid' - (Modelscope image2video)
    submitted by /u/glenniszen [link] [comments]  ( 9 min )
    What....this is insane...
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    I'm literally speachless.. 8O
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    After Getting Banned in Schools, OpenAI Launches ChatGPT Tool for Teachers
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    Retro sci-fi trailer made with AI
    submitted by /u/filmcrux [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/2/2023
    SAG-AFTRA, the union for US actors, is moving towards a potential strike against video game publishers, it’s announced.[1] Russia builds MSU-270 supercomputer for AI and HPC research.[2] Chuck Schumer has announced that his office will be meeting with top players in the artificial intelligence field later this month. Invited to the upcoming summit are tech megabillionaire Elon Musk, his one-time hypothetical sparring partner Meta CEO Mark Zuckerberg, OpenAI CEO Sam Altman, Google CEO Sundar Pichai, NVIDIA President Jensen Huang, and Alex Karpy, CEO of defense contractor creep Palantir.[3] Google expands AI compute offerings, partnership with Nvidia and more.[4] Sources: [1] https://www.videogameschronicle.com/news/actors-union-sag-aftra-could-launch-video-game-strikes-over-wages-and-ai/ [2] https://www.tomshardware.com/news/russian-400-petaflops-supercomputer-for-ai-comes-online [3] https://gizmodo.com/chuck-schumer-elon-musk-mark-zuckerberg-palantir-nvidia-1850788302 [4] https://www.itworldcanada.com/article/google-expands-ai-compute-offerings-partnership-with-nvidia-and-more/545625 submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    The Puzzle Created by GPT-4 That Even GPT-4 Can't Solve, Yet Humans Did! First Challenge Revealed.
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
  • Open

    Expert systems and RL
    I'm interested in learning more about how expert systems and explicit knowledge injection by various means can be used to guide and improve RL, both in terms of capability and in terms of reduced training times. I have a hard time finding good resources for this topic. What are some must-read papers on this topic? Are there any good youtube channels or online courses? I'm particularly interested in resources that feature practical implementations submitted by /u/worstthingsonline [link] [comments]  ( 9 min )
    Why am i getting this error?
    When I try to call check_env() from this code: from stable_baselines3.common.env_checker import check_env from agentStable import snakeEnv env = snakeEnv() check_env(env) I get this error: Traceback (most recent call last): File "myDir", line 6, in check_env(env) File "<myDir\Python\Python310\lib\site-packages\stable_baselines3\common\env_checker.py", line 396, in check_env assert isinstance( AssertionError: Your environment must inherit from the gymnasium.Env class cf. https://gymnasium.farama.org/api/env/ ​ Here is my agentStable.py code: import gym from gym import spaces import numpy as np from enum import Enum from collections import namedtuple import numpy as np from colorama import Fore from gameStable import SnakeGameAI, Direction, Point class snakeEnv(gym.Env): met…  ( 10 min )
    Autonomous Driving | Tight, dynamic and chaotic traffic | India | Swaayatt Robots
    submitted by /u/shani_786 [link] [comments]  ( 9 min )
    Zoomposium with Professor Dr. John-Dylan Haynes: "In search of the code of the brain"
    Zoomposium with Professor Dr. John-Dylan Haynes: "In search of the code of the brain" In this new episode of our "Zoomposium Series" on the topic of "Brain Research", my colleague Axel Stöcker from the "Blog der großen Fragen" and I have managed to win the well-known and renowned brain researcher and psychologist Professor Dr. John-Dylan Haynes for an interview. John-Dylan Haynes has been a professor of theory and analysis of long-range brain signals at the Bernstein Center for Computational Neuroscience and the Berlin Center for Advanced Neuroimaging (BCAN) at Charité and Humboldt University in Berlin since 2006. There, Professor Haynes and his team are "In Search of the Brain's Code". In order to crack this, larger amounts of data are collected from the functional magnetic resonanc…  ( 10 min )
    Considering in use between Model-free vs Model-based, and need suggestion in algorithms.
    In training MFRL, which is mostly simulation, why don't we use MBRL instead as the environment is accessible? **Correct me if I misunderstand in any. From my understanding, Model-Free RL (MFRL) is generally used for control tasks where environment is not accessible. It takes a sample of an experience from the environment and uses it to adjust its policy, either policy-based, value-based, and actor-critic. Model-Based RL (MBRL) uses a transition model to optimize the optimal policy like in model predictive control (MPC). I am interested in using RL for control multiple and continuous actions in continuous stochastic environment. For now, I am moving around DDPG. Do you have any suggested algorithm that match to my task? submitted by /u/AnnonymeowCat [link] [comments]  ( 9 min )
    Understanding how to get a dataset for more complex environments.
    In videos like this, it talks about how you need to find the right fit for your data which is shown on a scatter plot. I understand how this works when you have a dataset for something but how does it work when you are trying to train a DQN to play snake (eating apples and getting longer game). I have been struggling to tune my hyperparameters as well as figure out how many hidden neurons and hidden layers I need. I have found that right now 256 hidden neurons and 2 hidden layers works best. Please tell me if this topic has flown completely over my head and I am missing something. Thank you! submitted by /u/MrHank2 [link] [comments]  ( 9 min )
    FinRL and developing ML - skills and labour market
    If I wanted to hire an ML/RL student/full-time employee to help my firm develop some FinRL/other RL algorithms, what skills should I be looking for? How "generalized" are RL skills - from what I can tell alot of the RL I see posted here has to do with video games? I've stumbled across FinRL recently and would like to hire some help to develop some FinRL code. What's the market like for RL? I know all the rage is LLM's but how different is RL and does the labour market care about the difference? Based in Canada fyi. Won't be hiring for a few months. submitted by /u/Thrumpwart [link] [comments]  ( 9 min )
  • Open

    DFT conventions: NumPy vs Mathematica
    Just as there are multiple conventions for defining the Fourier transform, there are multiple conventions for defining the discrete Fourier transform (DFT), better known as the fast Fourier transform (FFT). [1] This post will look at two DFT conventions, one used in Python’s NumPy library, and one used in Mathematica. There are more conventions in […] DFT conventions: NumPy vs Mathematica first appeared on John D. Cook.  ( 6 min )

  • Open

    [D] Where did the research go?
    This sub used to be my go-to place for finding out cool new ML research but sadly it has now become a "generative AI" "AI productisation" circlejerk. I was wondering where people now go to discover new ML research (besides ArXiv of course!) submitted by /u/blabboy [link] [comments]  ( 9 min )
    [P] What are some good MLE project ideas ?
    What tech stack, what frameworks should I specially learn and use in my MLE project ? There are so much things mentioned in job offers, what would you suggest me to focus on ? I thought of fine-tuning LLM and deploying it using AWS. I'd prefer this project to be NLP oriented. I read about things like MLFlow, Apache Spar, Kubernetes etc. and don't know what to focus on. PS: I am currently a data scientist, and have recently finished a body pose estimation + action recognition web app, using Python/OpenCV/Mediapipe/Flask/Torch submitted by /u/tflbbl [link] [comments]  ( 9 min )
    [P][D] How do I improve car detection performance?
    For a college project I am trying to detect the number of cars in the first 3 rows of a parking lot. Here is my roboflow project page: https://universe.roboflow.com/boaztheostrich/cartest-tyaur As you can see I have been able to get my map score as high as .995 however I am still having difficulty consistently detecting cars in some edge cases. What I am currently testing is increasing the resolution from 1280x720 to 2048. I am new to all of this so any tips or tricks would be greatly appreciated. I am currently using google colab for training although I am considering switching over to vast.ai submitted by /u/johndowlelxdxdxdxdxd [link] [comments]  ( 9 min )
    Need collaborators for a natural language interface [P]
    We have it pretty functional but we're a small team so we need more people. https://github.com/apssouza22/chatflow Promo video: https://www.reddit.com/r/AGIunderconstruction/comments/168fsyr/come_build_open_source_natural_language/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/Cold-Explanation-984 [link] [comments]  ( 9 min )
    [D] How to create and use multiple dataframes in pyspark?
    Hey All, I have to read in multiple JSON files with each one containing objects in an array. For each file I want to create a data frame (A matrix might also work) where the rows and columns are just integers pointing to a string. Like this ​ - 0 1 2 0 dsad asd ad 1 asd asd grth 2 ter xc wer ​ Using the same JSON file I also want to process the objects inside the arrays using the dataframe (matrix) above. So the process (pipeline) would be something like: |==> Create Matrix =======>| JSON file =>| | ===> Use matrix to process object. |==> Individual objects ==>| ​ I have been looking through the docs but still unsure how to do this. Should I use a dataframe or a spark matrix? How do I split the objects into parts and also generate the matrix? How do I combine dataframes which isn't joining? Just a point in the right direction would be great. Thanks in advance for this relatively simple question. submitted by /u/atticusfinch975 [link] [comments]  ( 9 min )
    [D] can somone help me get this paper
    I don't have access can somone help me get it please Thank you https://www.worldscientific.com/doi/abs/10.1142/S0218001418560062 submitted by /u/SilenceOfTheUnicorns [link] [comments]  ( 9 min )
    [D]Tips (Algorithms)
    I have started learning ML a month ago... Did a foundational Google course and read from some other sources..hav learnt most of the theories....What's the best place to learn algorithms according to you? Any other tips are also welcome submitted by /u/Buri-Buri_zaemon [link] [comments]  ( 9 min )
    [D] RX 7900 XTX vs RTX 4080
    I know AMD is working on making ROCM support for RDNA 3, would that rival nvidia? and would there be enough support for it to be usable? Nvidia cards are way more expensive and i would like to use it for gaming besides Machine learning for my study. Also, would this be overkill? will an RTX 4070 or an RX 7900 XT also do the job just fine? i am new to ML and won't be using it till early 2024, thank you all for reading. submitted by /u/RepresentativeIll155 [link] [comments]  ( 9 min )
    [D] How to describe XGBoost, Boosting and Bagging?
    Hi Can someone here please help me with this algorithm? What is the “Boosting” part of the algorithm? To my (limited) understanding XGBoost is an ensemble learning algorithm that uses many decision trees (efficiently), where each tree tries to correct the loss of the previous one. But I’m not sure how this is connected to “Boosting” and then it’s cousin “Bagging” Any intuition that may help me here? submitted by /u/Ok_Reality2341 [link] [comments]  ( 9 min )
    [D] A case for summaries over abstracts
    I usually peruse the abstract of a paper before deciding on whether to read it or not. However, lately I've started longing for more personalized summaries. I wonder what others think of abstract vs summaries and their preferences of the latter over former ? In your opinion, how far has the field progressed in summarization (https://paperswithcode.com/dataset/scitldr) ? submitted by /u/JurrasicBarf [link] [comments]  ( 9 min )
    [D] NeurIPS reviewers edited review and score after discussion period: can they delete their own revision history?
    Hi, we have a paper submission to NeurIPS and we have two reviewers who changed their scores and review content silently by editing the original review comment and score after the discussion period. The edited review comment now discusses entirely different point. We would like to raise this concern to AC but the thing is that we didn’t save the original review comment, and the “revision history” for some reason doesn’t show the previous content, other than the entry that there was previous version. But this revision history overall isn’t inconsistent (showing the last two history after the discussion period, but the ones before the period is not shown) Can reviewers delete their own revision history in OpenReview tool? I don’t know if this is a bug or they deleted them with an intention. submitted by /u/mayasang [link] [comments]  ( 9 min )
    [D] Testing at 80-95% - Newly collected recent data 55% - WHY!?
    I am currently do some predictions on some market data, I'm using both XGBoost and lightGBM (not both at the same time just experimenting using both algorithms). I have around 2500 features and 40k rows of data in my dataset which is being split 75% = train, 12.5% = valid, 12.5% = Test. The balance of the data is massively imbalanced with a binary classification. On training im seeing 0 = 20192 and 1 = 8337. I am not using SMOTE or undersampling but rather using the alogirhms own parameters to combat the Imbalance, for example scale_pos_weight: y_data[0]/y_data[1]. Training is going very well, im using hyperopt tuner to tune my paramets and usually on average get 75% accuracy on testing, training will usually be a little higher such as 77% and valid will be fairly close to test. But the …  ( 10 min )
    [P] Build a Recommender System that Includes Term / Vector Recall, DeepFM Ranking, Inference Engine and Web Application.
    Hello everyone! I've noticed that most beginner-level tutorials on recommender systems primarily focus on model training, with limited information about deploying them in a production environment. Additionally, the different usage of models in the recall (retrieval) and ranking modules can indeed be confusing for beginners. Recently, I've been working on a recommender system project that encompasses both offline development and online deployment, covering both recall and ranking modules. The entire project is developed using Python and executed on a single laptop. All components are contained within Docker, ensuring no impact on the local environment. The GitHub repo: https://github.com/akiragy/recsys_pipeline You can follow the commands provided in the README to run it. This project primarily utilizes PyTorch, Redis, Elasticsearch, Feast Feature Store, Triton Inference Server, and Flask. PyTorch is used for training the FM model for recall and the DeepFM model for ranking. Redis serves as the store for user terms and vectors, while Elasticsearch is used to create an item term index and a vector index. Redis and Elasticsearch form the recall module. Feast is utilized to store user and item features, while Triton serves as a real-time prediction engine. Feast and Triton form the ranking module. Flask is deployed as the web server, receiving recommendation requests and returning responses. Thanks for checking it out! submitted by /u/Johann_SebastianBach [link] [comments]  ( 9 min )
    [Research] Benchmarking Neural Network Generalization for Grammar Induction
    Benchmark: 🧘 BLISS – a Benchmark for Language Induction from Small Sets https://github.com/taucompling/bliss/ Paper: https://arxiv.org/abs/2308.08253 submitted by /u/nurikolan [link] [comments]  ( 9 min )
    [R] Improving model results with EDCR
    We released another preprint on a neuro-symbolic approach called "metacognitive error correction and detection rules" (EDCR). The idea is that if you have a trained neural model, you can symbolically fine tune the results with rules. In this initial study, we apply it to the classification of GPS movement traces. Video: https://www.youtube.com/watch?v=d_OV4lap_rk Preprint: https://arxiv.org/abs/2308.14250 Code: https://github.com/lab-v2/Error-Detection-and-Correction Further information: https://neurosymbolic.asu.edu/metacognition/ In the example below, we show the results for a single class. The rules detect errors by identifying classifications that may be incorrect and then re-assign to a new class. While recall can drop for a given class, we can bound the drop in recall with a hyperparameter - but this is guaranteed to improve precision. This is illustrated in the below figure. We show this approach leads to an overall improvement in accuracy over the base model, including the state-of-the-art. We also examine the effects when encountering classes not seen in the model's training data. We provide theoretical as well as empirical results and believe this approach can be used in other use-cases in the future. ​ https://preview.redd.it/3z0cdp80dulb1.png?width=635&format=png&auto=webp&s=2eff6ce0f2c7b6983dbfbc030f0f7993010a30fb submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    [D] 10 hard-earned lessons from shipping generative AI products over the past 18 months
    Hey all, I'm the founder of a generative AI consultancy and we build gen AI powered products for other companies. We've been doing this for 18 months now and I thought I share our learnings - it might help others. ​ It's a never ending battle to keep up with the latest tools and developments. By the time you ship your product it's already using an outdated tech-stack. There are no best-practices yet. You need to make a bet on tools/processes and hope that things won't change much by the time you ship (they will, see point 2). If your generative AI product doesn't have a VC-backed competitor, there will be one soon. In order to win you need one of the two things: either (1) the best distribution or (2) the generative AI component is hidden in your product so others don't/can't copy you. AI researchers / data scientists are suboptimal choice for AI engineering. They're expensive, won't be able to solve most of your problems and likely want to focus on more fundamental problems rather than building products. Software engineers make the best AI engineers. They are able to solve 80% of your problems right away and they are motivated because they can "work in AI". Product designers need to get more technical, AI engineers need to get more product-oriented. The gap currently is too big and this leads to all sorts of problems during product development. Demo bias is real and it makes it 10x harder to deliver something that's in alignment with your client's expectation. Communicating this effectively is a real and underrated skill. There's no such thing as off-the-shelf AI generated content yet. Current tools are not reliable enough, they hallucinate, make up stuff and produce inconsistent results (applies to text, voice, image and video). submitted by /u/BootstrapGuy [link] [comments]  ( 10 min )
    [D] What is the best text-to-speech tool (preferably free) currently?
    Hi everyone, I need a TTS tool that sounds exactly like a human voice. I want to use it to edit some of my YouTube videos. I see a lot of TTS platforms around. Which do you recommend? I hope this isn't too much to ask. I would gladly appreciate it. Thanks in advance. submitted by /u/cessilh1 [link] [comments]  ( 9 min )
    [R] Real-time Road Segmentation without Dense Depth Images
    You can use the following code if you want to detect the road in real-time in your vehicle/robot : https://github.com/ErkanMilli/3MT-RoadSeg . One of the main problems in road segmentation by using depth was that if a region is flat, such as walls, it may be detected as road. This was already a known phenomenon and to overcome this, surface normal estimation was used. But, SNE requires dense depth images. Instead, we used a multi-task architecture, and used surface normals as an auxiliary loss, which reduced computation time significantly and also we don't need a dense depth image. Only LiDAR (which is sparse in nature) is sufficient. submitted by /u/ozgurerkent [link] [comments]  ( 9 min )
    [P] Threshold of acceptability in a fill-mask task with BERT
    Hi everyone, I am very new to machine learning and statistics, and am currently building an experiment that includes probing the knowledge of a bert-base-uncased model in a fill-mask test, without fine-tuning - just the regular pretrained model. I want to see the models knowledge of certain grammatical notions in English - whether its judgements are similar to those of humans or not :) My point is to give the model inputs like: "what do you call a room filled with socks? you called it a [MASK] filled room", or "a monster who eats rats is called a [MASK] eater", and check the probabilities it gives to the corresponding singular and plural token, e.g. in the first case I want to probe "sock" \ "socks", and in the second case "rat" / "rats". I built a script which does exactly this - pulls…  ( 10 min )
    [R][P] One class is hard to detect in vision project
    Hi, I’ve been working for a while now on a project to detect points in medical images which are to be classified into 3 different classes, but my UNet really struggles to predict one of the 3 classes (>70% score when excluding this class vs ~30% when not). I have tried putting a separate decoder just for this one class but the results are worse, and I don’t really have other ideas to better my results. Do you have any ideas/techniques to help me improve my results? Thanks ! submitted by /u/maths_and_baguette [link] [comments]  ( 9 min )
    [D] Stanford's ML for Graphs course
    Hi everyone. Has anybody taken this course from Stanford https://online.stanford.edu/courses/xcs224w-machine-learning-graphs or any other course in the same online portal? Was it worth it? I am considering to apply. Thanks submitted by /u/Realistic-Bed2658 [link] [comments]  ( 9 min )
    [R] Recurrent Forward Forward: Accuracy Issues
    Problem I recently did a bit of a career switch from big tech IoT Rust job, into a machine learning research role. For the last few months, I have been working on building out the Recurrent Forward Forward model from Hinton's Forward Forward paper (Fig3): https://arxiv.org/abs/2212.13345 I have an implementation, but have been stuck for the past 4-6 weeks on trying to improve the accuracy. My implementation is only getting 95% test accuracy on MNIST. Hinton and Alex Ororbia (author of this) have been able to achieve high test accuracy (99%+) using this architecture, so I know it is possible. What I have tried I have tried many different things at this point: Different activation functions. Weight initialization. Regularization techniques like transforms, jitter, and dynamic nega…  ( 11 min )
    [D][R] Why do we need the convolution in upsample and downsample blocks?
    Hi fellow computer scientists and engineers, ​ I've been wondering why do we often have a convolution inside every upsample and downsample block. Well, it makes sense, if you intend to upscale some features and use a bilinear interpolation, then some error can be introduced due to interpolation inaccuracies. This is where convolution layer comes handy to help and support the upscaling. But is this really the reason behind it? Or is there a deeper explanation? ​ Also, just for the sake of curiosity. What if the scale_factor of an upsample block was 1. Should we still keep the convolution layer? or just get rid of all the upsample block since there is no actual "upsampling" being done at least in the context of the tensor dimensions. ​ Thank you :) submitted by /u/Christs_Elite [link] [comments]  ( 9 min )
  • Open

    Markov Property
    Is that wrong if a problem doesn't satisfy the Markov property, I cannot solve it with the RL approach either? submitted by /u/nimageran [link] [comments]  ( 9 min )
    Negative KL-divergence RLHF implementation
    I am struggling to understand one part of the FAQ of the transformer reinforcement learning library from HuggingFace: What Is the Concern with Negative KL Divergence? If you generate text by purely sampling from the model distribution things work fine in general. But when you use the generate method there are a few caveats because it does not always purely sample depending on the settings which can cause KL-divergence to go negative. Essentially when the active model achieves log_p_token_active < log_p_token_ref we get negative KL-div. This can happen in a several cases: top-k sampling: the model can smooth out the probability distribution causing the top-k tokens having a smaller probability than those of the reference model but they still are selected min_length: this ignores the EOS token until min_length is reached. thus the model can assign a very high log prob to the EOS token and very low prob to all others until min_length is reached batched generation: finished sequences in a batch are padded until all generations are finished. The model can learn to assign very low probabilities to the padding tokens unless they are properly masked or removed.These are just a few examples. Why is negative KL an issue? The total reward R is computed R = r - beta * KL so if the model can learn how to drive KL-divergence negative it effectively gets a positive reward. In many cases it can be much easier to exploit such a bug in the generation than actually learning the reward function. In addition the KL can become arbitrarily small thus the actual reward can be very small compared to it. I understand why the KL-divergence that is computed here is an approximation that can be negative as opposed to the real one. However, I cannot wrap my head around the details of why these specific sampling parameters would lead to negative KL-divergence. Could someone elaborate on these points? submitted by /u/Loud_Appointment_418 [link] [comments]  ( 10 min )
    Working on a project which involves creating an agent to work on chess environment
    I am using DQN Algorithm and A2C algorithm (Not using any lookaheads to see potential moves and only using self-learning coz my teacher asked me not to look into future combinations and let it play and understand itself) separately to check the performance of the agent and the neural network gives probabilities of the moves in the size of 4096 (64*64) . But the probabilities are decreasing with each and every move performed and they are overfitting to one move which is an invalid move (same case for both dqn and a2c) so in the bellman equation i removed the next reward prediction and put constant value of 1 to check whether it is at least trying to increase the probability for valid moves but that doesnt seems to be the case because it is still giving probability of 1 for an invalid move. and there is also this case where the probabilities are getting so small they are becoming nan values. can someone provide some insights for me to look into submitted by /u/S_U_B_B_U [link] [comments]  ( 9 min )
    UCL Reinforcement learning lectures
    I see lectures on youtube from UCL+DeppMind on RL spanning from 2015 through 2021. Which one would you say is the best to follow? I've heard many good things about David Silver's lectures, but how do the most recent, 2021, lectures compare? submitted by /u/Practical_Ad_8782 [link] [comments]  ( 9 min )
  • Open

    DFT mandalas
    Math books often use some illustration from the book contents as cover art. When they do, there’s often some mystery to the cover art, and a sense of accomplishment when you get far enough into the book to understand the significance of the cover. (See examples here.) William L. Briggs and Van Emden Henson wrote […] DFT mandalas first appeared on John D. Cook.  ( 5 min )
  • Open

    Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Metacognition with EDCR
    submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
  • Open

    Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part III
    This blog post is not the end of my journey to integrate GenAI with my “Thinking Like a Data Scientist” (TLADS) methodology, but it is the last post on this leg of the journey. And the journey has been fascinating.  I can’t wait to get this modified material in front of my students. In part… Read More »Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part III The post Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part III appeared first on Data Science Central.  ( 24 min )
  • Open

    Elon Musk's X to leverage public data for AI model training
    Elon Musk's X revealed its plans to utilize user data and publicly available information in training AI models. Despite Musk's assurance that only public data will be used, concerns around privacy linger. For expert insights into AI developments, look here first. https://preview.redd.it/mj64uof9rvlb1.png?width=2000&format=png&auto=webp&s=a856ce3e4b6063ebf7a585df3338142defba6323 X's approach to AI training Under the most recent privacy policy, X will harness the personal data it collects and publicly accessible information for its machine learning algorithms. Musk assures only publicly accessible data will be used, safeguarding private user information like DMs. However, with X having disbanded its press operation, more specific details about the data collected and its intended use still need to be provided. Unfolding plans of Musk Despite X's quiet stance on AI, Musk recently launched xAI, aspiring "to understand the true nature of the universe." xAI's homepage discloses plans to sync with X closely, possibly using collected user data to progress the mission. A competitive stance against LinkedIn suggests a possible additional motive for data collection, speculating an enhanced job and education section on X. Despite concerns about selling user data for revenue, concrete evidence is needed to support this argument, reflecting Twitter's previous strategy. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    The mystery of AI learning is solved by Stanford researchers
    Say goodbye to the black box of deep learning and hello to a new era of transparent, efficient, and ethical AI. Find out how this changes EVERYTHING! https://kinews24.de/stanford-cracks-the-ai-code-the-groundbreaking-law-of-equi-separation ​ submitted by /u/myreddit333 [link] [comments]  ( 9 min )
    Ideas for a high school AI/ML club
    I'm thinking of creating an AI club at my high school. The problem is, unlike something like math or coding, there aren't many competitions suitable to beginners and not a lot of previous template content to follow. Therefore, I need to forge my own path. I am curious what your ideas are for some engaging, high-school-friendly topics and events to have, especially if we can only meet for 30 minutes a week. Thanks in advance! submitted by /u/0xCUBE [link] [comments]  ( 9 min )
    An AI to help with my psychology assignment?
    My psychology masters assignments are to be handwritten and hence a somewhat painstaking process. To streamline I was looking for an AI that can guide me on the concepts and understanding of the given psychology subjects. I don't want to use it as a shortcut just a tool for studying and guiding. In accordance with books and Google. Can anyone know of such an AI? submitted by /u/Maddragon0088 [link] [comments]  ( 9 min )
    follow me on X for ai news without the garbage. just made an account bc im tired of these annoying accounts and decided to just make my own ai news account
    submitted by /u/nicdunz [link] [comments]  ( 9 min )
    Could AI be the game-changer in tackling the opioid epidemic?
    The stubborn and complex opioid epidemic may finally meet its match—AI. As the crisis continues taking a fearsome toll, experts are turning to advanced technology in their ongoing battle. If you want to stay on top of the latest trends and insights in AI, look here first. https://preview.redd.it/vm23xflorqlb1.jpg?width=1390&format=pjpg&auto=webp&s=212b88fb01eb0f7afaa5011120267ac4ce37ee35 AI’s evolving role in tackling the opioid crisis With a legacy of over 1 million overdose deaths since 1999, the opioid crisis has stubbornly resisted traditional preventive and regulatory measures. The latest AI-fueled developments offer newfound hope. Groundbreaking AI innovations are focusing on identifying individuals at potential risk, monitoring treatment progress, and predicting relapse probabilities. Decoding social media behavior offers an effective outlet for early intervention. More radically, AI-enabled wearable devices are being developed to detect overdose symptoms and automatically deliver lifesaving treatment. AI: A double-edged sword? Despite its promising potential, AI application in this sphere also raises concerns around privacy rights and misinformation. Facial recognition technology could lead to discrimination, while the risk of false data being fed into chatbots causing harm cannot be undermined. Trust in AI and its appropriate deployment will be crucial to ensuring its positive contribution rather than being a dystopian threat. P.S. If you like this kind of analysis, you’ll love my free newsletter that tracks the most relevant news and research in AI and tech. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    Knowledge Graph Embeddings in the Biomedical Domain: Are They Useful? A Look at Link Prediction, Rule Learning, and Downstream Polypharmacy Tasks. (arXiv:2305.19979v2 [cs.LG] UPDATED)
    Knowledge graphs are powerful tools for representing and organising complex biomedical data. Several knowledge graph embedding algorithms have been proposed to learn from and complete knowledge graphs. However, a recent study demonstrates the limited efficacy of these embedding algorithms when applied to biomedical knowledge graphs, raising the question of whether knowledge graph embeddings have limitations in biomedical settings. This study aims to apply state-of-the-art knowledge graph embedding models in the context of a recent biomedical knowledge graph, BioKG, and evaluate their performance and potential downstream uses. We achieve a three-fold improvement in terms of performance based on the HITS@10 score over previous work on the same biomedical knowledge graph. Additionally, we provide interpretable predictions through a rule-based method. We demonstrate that knowledge graph embedding models are applicable in practice by evaluating the best-performing model on four tasks that represent real-life polypharmacy situations. Results suggest that knowledge learnt from large biomedical knowledge graphs can be transferred to such downstream use cases. Our code is available at https://github.com/aryopg/biokge.  ( 3 min )
    CongNaMul: A Dataset for Advanced Image Processing of Soybean Sprouts. (arXiv:2308.15690v2 [cs.CV] UPDATED)
    We present 'CongNaMul', a comprehensive dataset designed for various tasks in soybean sprouts image analysis. The CongNaMul dataset is curated to facilitate tasks such as image classification, semantic segmentation, decomposition, and measurement of length and weight. The classification task provides four classes to determine the quality of soybean sprouts: normal, broken, spotted, and broken and spotted, for the development of AI-aided automatic quality inspection technology. For semantic segmentation, images with varying complexity, from single sprout images to images with multiple sprouts, along with human-labelled mask images, are included. The label has 4 different classes: background, head, body, tail. The dataset also provides images and masks for the image decomposition task, including two separate sprout images and their combined form. Lastly, 5 physical features of sprouts (head length, body length, body thickness, tail length, weight) are provided for image-based measurement tasks. This dataset is expected to be a valuable resource for a wide range of research and applications in the advanced analysis of images of soybean sprouts. Also, we hope that this dataset can assist researchers studying classification, semantic segmentation, decomposition, and physical feature measurement in other industrial fields, in evaluating their models. The dataset is available at the authors' repository. (https://bhban.kr/data)  ( 2 min )
    Online Distributed Learning with Quantized Finite-Time Coordination. (arXiv:2307.06620v2 [cs.LG] UPDATED)
    In this paper we consider online distributed learning problems. Online distributed learning refers to the process of training learning models on distributed data sources. In our setting a set of agents need to cooperatively train a learning model from streaming data. Differently from federated learning, the proposed approach does not rely on a central server but only on peer-to-peer communications among the agents. This approach is often used in scenarios where data cannot be moved to a centralized location due to privacy, security, or cost reasons. In order to overcome the absence of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.  ( 2 min )
    Neural ShDF: Reviving an Efficient and Consistent Mesh Segmentation Method. (arXiv:2306.11737v2 [cs.GR] UPDATED)
    Partitioning a polygonal mesh into meaningful parts can be challenging. Many applications require decomposing such structures for further processing in computer graphics. In the last decade, several methods were proposed to tackle this problem, at the cost of intensive computational times. Recently, machine learning has proven to be effective for the segmentation task on 3D structures. Nevertheless, these state-of-the-art methods are often hardly generalizable and require dividing the learned model into several specific classes of objects to avoid overfitting. We present a data-driven approach leveraging deep learning to encode a mapping function prior to mesh segmentation for multiple applications. Our network reproduces a neighborhood map using our knowledge of the \textsl{Shape Diameter Function} (SDF) method using similarities among vertex neighborhoods. Our approach is resolution-agnostic as we downsample the input meshes and query the full-resolution structure solely for neighborhood contributions. Using our predicted SDF values, we can inject the resulting structure into a graph-cut algorithm to generate an efficient and robust mesh segmentation while considerably reducing the required computation times.  ( 2 min )
    Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning. (arXiv:2307.04726v2 [cs.LG] UPDATED)
    Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for data collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to represent multimodal behavior in the dataset. Nevertheless, these methods are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method, named State Reconstruction for Diffusion Policies (SRDP), incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes more descriptive representation learning of states to alleviate the distribution shift incurred by the out-of-distribution (OOD) states. We design a novel 2D Multimodal Contextual Bandit environment to illustrate the OOD generalization of SRDP compared to prior algorithms. In addition, we assess the performance of our model on D4RL continuous control benchmarks, namely the navigation of an 8-DoF ant and forward locomotion of half-cheetah, hopper, and walker2d, achieving state-of-the-art results.  ( 2 min )
    Improving the Validity of Decision Trees as Explanations. (arXiv:2306.06777v3 [cs.LG] UPDATED)
    In classification and forecasting with tabular data, one often utilizes tree-based models. Those can be competitive with deep neural networks on tabular data [cf. Grinsztajn et al., NeurIPS 2022, arXiv:2207.08815] and, under some conditions, explainable. The explainability depends on the depth of the tree and the accuracy in each leaf of the tree. Decision trees containing leaves with unbalanced accuracy can provide misleading explanations. Low-accuracy leaves give less valid explanations, which could be interpreted as unfairness among explanations. Here, we train a shallow tree with the objective of minimizing the maximum misclassification error across each leaf node. Then, we extend each leaf with a separate tree-based model. The shallow tree provides a global explanation, while the overall statistical performance of the shallow tree with extended leaves improves upon decision trees of unlimited depth trained using classical methods (e.g., CART) and is comparable to state-of-the-art methods (e.g., well-tuned XGBoost).  ( 2 min )
    Dynamic Data Augmentation via MCTS for Prostate MRI Segmentation. (arXiv:2305.15777v2 [eess.IV] UPDATED)
    Medical image data are often limited due to the expensive acquisition and annotation process. Hence, training a deep-learning model with only raw data can easily lead to overfitting. One solution to this problem is to augment the raw data with various transformations, improving the model's ability to generalize to new data. However, manually configuring a generic augmentation combination and parameters for different datasets is non-trivial due to inconsistent acquisition approaches and data distributions. Therefore, automatic data augmentation is proposed to learn favorable augmentation strategies for different datasets while incurring large GPU overhead. To this end, we present a novel method, called Dynamic Data Augmentation (DDAug), which is efficient and has negligible computation cost. Our DDAug develops a hierarchical tree structure to represent various augmentations and utilizes an efficient Monte-Carlo tree searching algorithm to update, prune, and sample the tree. As a result, the augmentation pipeline can be optimized for each dataset automatically. Experiments on multiple Prostate MRI datasets show that our method outperforms the current state-of-the-art data augmentation strategies.
    Biclustering Methods via Sparse Penalty. (arXiv:2308.14388v2 [stat.ML] UPDATED)
    In this paper, we first reviewed several biclustering methods that are used to identify the most significant clusters in gene expression data. Here we mainly focused on the SSVD(sparse SVD) method and tried a new sparse penalty named "Prenet penalty" which has been used only in factor analysis to gain sparsity. Then in the simulation study, we tried different types of generated datasets (with different sparsity and dimension) and tried 1-layer approximation then for k-layers which shows the mixed Prenet penalty is very effective for non-overlapped data. Finally, we used some real gene expression data to show the behavior of our methods.
    Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings. (arXiv:2306.17670v2 [cs.NE] UPDATED)
    Spiking Neural Networks (SNNs) are a promising research direction for building power-efficient information processing systems, especially for temporal tasks such as speech recognition. In SNNs, delays refer to the time needed for one spike to travel from one neuron to another. These delays matter because they influence the spike arrival times, and it is well-known that spiking neurons respond more strongly to coincident input spikes. More formally, it has been shown theoretically that plastic delays greatly increase the expressivity in SNNs. Yet, efficient algorithms to learn these delays have been lacking. Here, we propose a new discrete-time algorithm that addresses this issue in deep feedforward SNNs using backpropagation, in an offline manner. To simulate delays between consecutive layers, we use 1D convolutions across time. The kernels contain only a few non-zero weights - one per synapse - whose positions correspond to the delays. These positions are learned together with the weights using the recently proposed Dilated Convolution with Learnable Spacings (DCLS). We evaluated our method on three datasets: the Spiking Heidelberg Dataset (SHD), the Spiking Speech Commands (SSC) and its non-spiking version Google Speech Commands v0.02 (GSC) benchmarks, which require detecting temporal patterns. We used feedforward SNNs with two or three hidden fully connected layers, and vanilla leaky integrate-and fire neurons. We showed that fixed random delays help and that learning them helps even more. Furthermore, our method outperformed the state-of-the-art in the three datasets without using recurrent connections and with substantially fewer parameters. Our work demonstrates the potential of delay learning in developing accurate and precise models for temporal data processing. Our code is based on PyTorch / SpikingJelly and available at: https://github.com/Thvnvtos/SNN-delays
    Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML. (arXiv:2306.05109v2 [cs.LG] UPDATED)
    Medical applications of machine learning (ML) have experienced a surge in popularity in recent years. The intensive care unit (ICU) is a natural habitat for ML given the abundance of available data from electronic health records. Models have been proposed to address numerous ICU prediction tasks like the early detection of complications. While authors frequently report state-of-the-art performance, it is challenging to verify claims of superiority. Datasets and code are not always published, and cohort definitions, preprocessing pipelines, and training setups are difficult to reproduce. This work introduces Yet Another ICU Benchmark (YAIB), a modular framework that allows researchers to define reproducible and comparable clinical ML experiments; we offer an end-to-end solution from cohort definition to model evaluation. The framework natively supports most open-access ICU datasets (MIMIC III/IV, eICU, HiRID, AUMCdb) and is easily adaptable to future ICU datasets. Combined with a transparent preprocessing pipeline and extensible training code for multiple ML and deep learning models, YAIB enables unified model development. Our benchmark comes with five predefined established prediction tasks (mortality, acute kidney injury, sepsis, kidney function, and length of stay) developed in collaboration with clinicians. Adding further tasks is straightforward by design. Using YAIB, we demonstrate that the choice of dataset, cohort definition, and preprocessing have a major impact on the prediction performance - often more so than model class - indicating an urgent need for YAIB as a holistic benchmarking tool. We provide our work to the clinical ML community to accelerate method development and enable real-world clinical implementations. Software Repository: https://github.com/rvandewater/YAIB.
    MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision. (arXiv:2308.16139v2 [cs.CV] UPDATED)
    We present MedShapeNet, a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D surgical instrument models. Prior to the deep learning era, the broad application of statistical shape models (SSMs) in medical image analysis is evidence that shapes have been commonly used to describe medical data. Nowadays, however, state-of-the-art (SOTA) deep learning algorithms in medical imaging are predominantly voxel-based. In computer vision, on the contrary, shapes (including, voxel occupancy grids, meshes, point clouds and implicit surface models) are preferred data representations in 3D, as seen from the numerous shape-related publications in premier vision conferences, such as the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), as well as the increasing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models) in computer vision research. MedShapeNet is created as an alternative to these commonly used shape benchmarks to facilitate the translation of data-driven vision algorithms to medical applications, and it extends the opportunities to adapt SOTA vision algorithms to solve critical medical problems. Besides, the majority of the medical shapes in MedShapeNet are modeled directly on the imaging data of real patients, and therefore it complements well existing shape benchmarks comprising of computer-aided design (CAD) models. MedShapeNet currently includes more than 100,000 medical shapes, and provides annotations in the form of paired data. It is therefore also a freely available repository of 3D models for extended reality (virtual reality - VR, augmented reality - AR, mixed reality - MR) and medical 3D printing. This white paper describes in detail the motivations behind MedShapeNet, the shape acquisition procedures, the use cases, as well as the usage of the online shape search portal: https://medshapenet.ikim.nrw/  ( 4 min )
    Why Does Little Robustness Help? Understanding and Improving Adversarial Transferability from Surrogate Training. (arXiv:2307.07873v5 [cs.LG] UPDATED)
    Adversarial examples (AEs) for DNNs have been shown to be transferable: AEs that successfully fool white-box surrogate models can also deceive other black-box models with different architectures. Although a bunch of empirical studies have provided guidance on generating highly transferable AEs, many of these findings lack explanations and even lead to inconsistent advice. In this paper, we take a further step towards understanding adversarial transferability, with a particular focus on surrogate aspects. Starting from the intriguing little robustness phenomenon, where models adversarially trained with mildly perturbed adversarial samples can serve as better surrogates, we attribute it to a trade-off between two predominant factors: model smoothness and gradient similarity. Our investigations focus on their joint effects, rather than their separate correlations with transferability. Through a series of theoretical and empirical analyses, we conjecture that the data distribution shift in adversarial training explains the degradation of gradient similarity. Building on these insights, we explore the impacts of data augmentation and gradient regularization on transferability and identify that the trade-off generally exists in the various training mechanisms, thus building a comprehensive blueprint for the regulation mechanism behind transferability. Finally, we provide a general route for constructing better surrogates to boost transferability which optimizes both model smoothness and gradient similarity simultaneously, e.g., the combination of input gradient regularization and sharpness-aware minimization (SAM), validated by extensive experiments. In summary, we call for attention to the united impacts of these two factors for launching effective transfer attacks, rather than optimizing one while ignoring the other, and emphasize the crucial role of manipulating surrogate models.  ( 3 min )
    MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation. (arXiv:2305.08396v4 [eess.IV] UPDATED)
    In this work, we present MaxViT-UNet, an Encoder-Decoder based hybrid vision transformer (CNN-Transformer) for medical image segmentation. The proposed Hybrid Decoder, based on MaxViT-block, is designed to harness the power of both the convolution and self-attention mechanisms at each decoding stage with a nominal memory and computational burden. The inclusion of multi-axis self-attention, within each decoder stage, significantly enhances the discriminating capacity between the object and background regions, thereby helping in improving the segmentation efficiency. In the Hybrid Decoder block, the fusion process commences by integrating the upsampled lower-level decoder features, obtained through transpose convolution, with the skip-connection features derived from the hybrid encoder. Subsequently, the fused features undergo refinement through the utilization of a multi-axis attention mechanism. The proposed decoder block is repeated multiple times to progressively segment the nuclei regions. Experimental results on MoNuSeg18 and MoNuSAC20 dataset demonstrates the effectiveness of the proposed technique. Our MaxViT-UNet outperformed the previous CNN-based (UNet) and Transformer-based (Swin-UNet) techniques by a considerable margin on both of the standard datasets. The following github (https://github.com/PRLAB21/MaxViT-UNet) contains the implementation and trained weights.  ( 2 min )
    Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models. (arXiv:2305.10474v2 [cs.CV] UPDATED)
    Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy. While off-the-shelf billion-scale datasets for image generation are available, collecting similar video data of the same scale is still challenging. Also, training a video diffusion model is computationally much more expensive than its image counterpart. In this work, we explore finetuning a pretrained image diffusion model with video data as a practical solution for the video synthesis task. We find that naively extending the image noise prior to video noise prior in video diffusion leads to sub-optimal performance. Our carefully designed video noise prior leads to substantially better performance. Extensive experimental validation shows that our model, Preserve Your Own Correlation (PYoCo), attains SOTA zero-shot text-to-video results on the UCF-101 and MSR-VTT benchmarks. It also achieves SOTA video generation quality on the small-scale UCF-101 benchmark with a $10\times$ smaller model using significantly less computation than the prior art.  ( 2 min )
    Neural Mixed Effects for Nonlinear Personalized Predictions. (arXiv:2306.08149v3 [cs.LG] UPDATED)
    Personalized prediction is a machine learning approach that predicts a person's future observations based on their past labeled observations and is typically used for sequential tasks, e.g., to predict daily mood ratings. When making personalized predictions, a model can combine two types of trends: (a) trends shared across people, i.e., person-generic trends, such as being happier on weekends, and (b) unique trends for each person, i.e., person-specific trends, such as a stressful weekly meeting. Mixed effect models are popular statistical models to study both trends by combining person-generic and person-specific parameters. Though linear mixed effect models are gaining popularity in machine learning by integrating them with neural networks, these integrations are currently limited to linear person-specific parameters: ruling out nonlinear person-specific trends. In this paper, we propose Neural Mixed Effect (NME) models to optimize nonlinear person-specific parameters anywhere in a neural network in a scalable manner. NME combines the efficiency of neural network optimization with nonlinear mixed effects modeling. Empirically, we observe that NME improves performance across six unimodal and multimodal datasets, including a smartphone dataset to predict daily mood and a mother-adolescent dataset to predict affective state sequences where half the mothers experience at least moderate symptoms of depression. Furthermore, we evaluate NME for two model architectures, including for neural conditional random fields (CRF) to predict affective state sequences where the CRF learns nonlinear person-specific temporal transitions between affective states. Analysis of these person-specific transitions on the mother-adolescent dataset shows interpretable trends related to the mother's depression symptoms.  ( 3 min )
    Mixed-type Distance Shrinkage and Selection for Clustering via Kernel Metric Learning. (arXiv:2306.01890v2 [cs.LG] UPDATED)
    Distance-based clustering and classification are widely used in various fields to group mixed numeric and categorical data. In many algorithms, a predefined distance measurement is used to cluster data points based on their dissimilarity. While there exist numerous distance-based measures for data with pure numerical attributes and several ordered and unordered categorical metrics, an efficient and accurate distance for mixed-type data that utilizes the continuous and discrete properties simulatenously is an open problem. Many metrics convert numerical attributes to categorical ones or vice versa. They handle the data points as a single attribute type or calculate a distance between each attribute separately and add them up. We propose a metric called KDSUM that uses mixed kernels to measure dissimilarity, with cross-validated optimal bandwidth selection. We demonstrate that KDSUM is a shrinkage method from existing mixed-type metrics to a uniform dissimilarity metric, and improves clustering accuracy when utilized in existing distance-based clustering algorithms on simulated and real-world datasets containing continuous-only, categorical-only, and mixed-type data.  ( 2 min )
    Multi-Response Heteroscedastic Gaussian Process Models and Their Inference. (arXiv:2308.15370v2 [stat.ML] UPDATED)
    Despite the widespread utilization of Gaussian process models for versatile nonparametric modeling, they exhibit limitations in effectively capturing abrupt changes in function smoothness and accommodating relationships with heteroscedastic errors. Addressing these shortcomings, the heteroscedastic Gaussian process (HeGP) regression seeks to introduce flexibility by acknowledging the variability of residual variances across covariates in the regression model. In this work, we extend the HeGP concept, expanding its scope beyond regression tasks to encompass classification and state-space models. To achieve this, we propose a novel framework where the Gaussian process is coupled with a covariate-induced precision matrix process, adopting a mixture formulation. This approach enables the modeling of heteroscedastic covariance functions across covariates. To mitigate the computational challenges posed by sampling, we employ variational inference to approximate the posterior and facilitate posterior predictive modeling. Additionally, our training process leverages an EM algorithm featuring closed-form M-step updates to efficiently evaluate the heteroscedastic covariance function. A notable feature of our model is its consistent performance on multivariate responses, accommodating various types (continuous or categorical) seamlessly. Through a combination of simulations and real-world applications in climatology, we illustrate the model's prowess and advantages. By overcoming the limitations of traditional Gaussian process models, our proposed framework offers a robust and versatile tool for a wide array of applications.  ( 2 min )
    Speeding up Fourier Neural Operators via Mixed Precision. (arXiv:2307.15034v2 [cs.LG] UPDATED)
    The Fourier neural operator (FNO) is a powerful technique for learning surrogate maps for partial differential equation (PDE) solution operators. For many real-world applications, which often require high-resolution data points, training time and memory usage are significant bottlenecks. While there are mixed-precision training techniques for standard neural networks, those work for real-valued datatypes on finite dimensions and therefore cannot be directly applied to FNO, which crucially operates in the (complex-valued) Fourier domain and in function spaces. On the other hand, since the Fourier transform is already an approximation (due to discretization error), we do not need to perform the operation at full precision. In this work, we (i) profile memory and runtime for FNO with full and mixed-precision training, (ii) conduct a study on the numerical stability of mixed-precision training of FNO, and (iii) devise a training routine which substantially decreases training time and memory usage (up to 34%), with little or no reduction in accuracy, on the Navier-Stokes and Darcy flow equations. Combined with the recently proposed tensorized FNO (Kossaifi et al., 2023), the resulting model has far better performance while also being significantly faster than the original FNO.  ( 2 min )
    The Role of Diverse Replay for Generalisation in Reinforcement Learning. (arXiv:2306.05727v2 [cs.LG] UPDATED)
    In reinforcement learning (RL), key components of many algorithms are the exploration strategy and replay buffer. These strategies regulate what environment data is collected and trained on and have been extensively studied in the RL literature. In this paper, we investigate the impact of these components in the context of generalisation in multi-task RL. We investigate the hypothesis that collecting and training on more diverse data from the training environments will improve zero-shot generalisation to new tasks. We motivate mathematically and show empirically that generalisation to tasks that are "reachable'' during training is improved by increasing the diversity of transitions in the replay buffer. Furthermore, we show empirically that this same strategy also shows improvement for generalisation to similar but "unreachable'' tasks which could be due to improved generalisation of the learned latent representations.  ( 2 min )
    pTSE: A Multi-model Ensemble Method for Probabilistic Time Series Forecasting. (arXiv:2305.11304v2 [cs.LG] UPDATED)
    Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that the model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemble methods cannot be directly applied to improve the robustness and accuracy of forecasting. To address this issue, we propose pTSE, a multi-model distribution ensemble method for probabilistic forecasting based on Hidden Markov Model (HMM). pTSE only takes off-the-shelf outputs from member models without requiring further information about each model. Besides, we provide a complete theoretical analysis of pTSE to prove that the empirical distribution of time series subject to an HMM will converge to the stationary distribution almost surely. Experiments on benchmarks show the superiority of pTSE overall member models and competitive ensemble methods.  ( 2 min )
    Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media. (arXiv:2307.09312v2 [cs.CL] UPDATED)
    We present the Multi-Modal Discussion Transformer (mDT), a novel multi-modal graph-based transformer model for detecting hate speech in online social networks, such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the contextual relationships in the entire discussion surrounding a comment and grounding the interwoven fusion layers that combine individual comments' text and image embeddings instead of processing modalities separately. We compare the performance of our model to baselines that only process individual comments and conduct extensive ablation studies. To evaluate our work, we present a new dataset, HatefulDiscussions, comprising complete multi-modal discussions from multiple online communities on Reddit. We conclude with future work for multimodal solutions to deliver social value in online contexts, arguing that capturing a holistic view of a conversation significantly advances the effort to detect anti-social behaviour.  ( 2 min )
    Generative Sliced MMD Flows with Riesz Kernels. (arXiv:2305.11463v2 [cs.LG] UPDATED)
    Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \Vert x-y\Vert^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N^2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.  ( 2 min )
    Data-driven Predictive Latency for 5G: A Theoretical and Experimental Analysis Using Network Measurements. (arXiv:2307.02329v3 [cs.NI] UPDATED)
    The advent of novel 5G services and applications with binding latency requirements and guaranteed Quality of Service (QoS) hastened the need to incorporate autonomous and proactive decision-making in network management procedures. The objective of our study is to provide a thorough analysis of predictive latency within 5G networks by utilizing real-world network data that is accessible to mobile network operators (MNOs). In particular, (i) we present an analytical formulation of the user-plane latency as a Hypoexponential distribution, which is validated by means of a comparative analysis with empirical measurements, and (ii) we conduct experimental results of probabilistic regression, anomaly detection, and predictive forecasting leveraging on emerging domains in Machine Learning (ML), such as Bayesian Learning (BL) and Machine Learning on Graphs (GML). We test our predictive framework using data gathered from scenarios of vehicular mobility, dense-urban traffic, and social gathering events. Our results provide valuable insights into the efficacy of predictive algorithms in practical applications.  ( 2 min )
    DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks. (arXiv:2307.05628v3 [q-bio.GN] UPDATED)
    Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains a challenge. To address this, we propose DNAGPT, a generalized DNA pre-training model trained on over 200 billion base pairs from all mammals. By enhancing the classic GPT model with a binary classification task (DNA sequence order), a numerical regression task (guanine-cytosine content prediction), and a comprehensive token language, DNAGPT can handle versatile DNA analysis tasks while processing both sequence and numerical data. Our evaluation of genomic signal and region recognition, mRNA abundance regression, and artificial genomes generation tasks demonstrates DNAGPT's superior performance compared to existing models designed for specific downstream tasks, benefiting from pre-training using the newly designed model structure.  ( 2 min )
    Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks. (arXiv:2206.07741v2 [cs.LG] UPDATED)
    The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference, facilitating the use of DNNs on edge computing platforms. Recent efforts at quantizing DNNs have employed a range of techniques encompassing progressive quantization, step-size adaptation, and gradient scaling. This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing. Our method establishes a new pareto frontier in model accuracy and memory footprint demonstrating a range of quantized models, delivering best-in-class accuracy below 4.3 MB of weights (wgts.) and activations (acts.). Our main contributions are: (i) hardware-aware heterogeneous differentiable quantization with tensor-sliced learned precision, (ii) targeted gradient modification for wgts. and acts. to mitigate quantization errors, and (iii) a multi-phase learning schedule to address instability in learning arising from updates to the learned quantizer and model parameters. We demonstrate the effectiveness of our techniques on the ImageNet dataset across a range of models including EfficientNet-Lite0 (e.g., 4.14MB of wgts. and acts. at 67.66% accuracy) and MobileNetV2 (e.g., 3.51MB wgts. and acts. at 65.39% accuracy).  ( 2 min )
    Seeking Interpretability and Explainability in Binary Activated Neural Networks. (arXiv:2209.03450v2 [cs.LG] UPDATED)
    We study the use of binary activated neural networks as interpretable and explainable predictors in the context of regression tasks on tabular data; more specifically, we provide guarantees on their expressiveness, present an approach based on the efficient computation of SHAP values for quantifying the relative importance of the features, hidden neurons and even weights. As the model's simplicity is instrumental in achieving interpretability, we propose a greedy algorithm for building compact binary activated networks. This approach doesn't need to fix an architecture for the network in advance: it is built one layer at a time, one neuron at a time, leading to predictors that aren't needlessly complex for a given task.  ( 2 min )
    Inferring Traffic Models in Terminal Airspace from Flight Tracks and Procedures. (arXiv:2303.09981v2 [cs.LG] UPDATED)
    Realistic aircraft trajectory models are useful in the design and validation of air traffic management (ATM) systems. Models of aircraft operated under instrument flight rules (IFR) require capturing the variability inherent in how aircraft follow standard flight procedures. The variability in aircraft behavior varies among flight stages. In this paper, we propose a probabilistic model that can learn the variability from the procedural data and flight tracks collected from radar surveillance data. For each segment, a Gaussian mixture model is used to learn the deviations of aircraft trajectories from their procedures. Given new procedures, we can generate synthetic trajectories by sampling a series of deviations from the trained Gaussian distributions and reconstructing the aircraft trajectory using the deviations and the procedures. We extend this method to capture pairwise correlations between aircraft and show how a pairwise model can be used to generate traffic involving an arbitrary number of aircraft. We demonstrate the proposed models on the arrival tracks and procedures of the John F. Kennedy International Airport. The distributional similarity between the original and the synthetic trajectory dataset was evaluated using the Jensen-Shannon divergence between the empirical distributions of different variables. We also provide qualitative analyses of the synthetic trajectories generated from the models.  ( 2 min )
    Collage Diffusion. (arXiv:2303.00262v2 [cs.CV] UPDATED)
    We seek to give users precise control over diffusion-based image generation by modeling complex scenes as sequences of layers, which define the desired spatial arrangement and visual attributes of objects in the scene. Collage Diffusion harmonizes the input layers to make objects fit together -- the key challenge involves minimizing changes in the positions and key visual attributes of the input layers while allowing other attributes to change in the harmonization process. We ensure that objects are generated in the correct locations by modifying text-image cross-attention with the layers' alpha masks. We preserve key visual attributes of input layers by learning specialized text representations per layer and by extending ControlNet to operate on layers. Layer input allows users to control the extent of image harmonization on a per-object basis, and users can even iteratively edit individual objects in generated images while keeping other objects fixed. By leveraging the rich information present in layer input, Collage Diffusion generates globally harmonized images that maintain desired object characteristics better than prior approaches.  ( 2 min )
    Fair Attribute Completion on Graph with Missing Attributes. (arXiv:2302.12977v3 [cs.LG] UPDATED)
    Tackling unfairness in graph learning models is a challenging task, as the unfairness issues on graphs involve both attributes and topological structures. Existing work on fair graph learning simply assumes that attributes of all nodes are available for model training and then makes fair predictions. In practice, however, the attributes of some nodes might not be accessible due to missing data or privacy concerns, which makes fair graph learning even more challenging. In this paper, we propose FairAC, a fair attribute completion method, to complement missing information and learn fair node embeddings for graphs with missing attributes. FairAC adopts an attention mechanism to deal with the attribute missing problem and meanwhile, it mitigates two types of unfairness, i.e., feature unfairness from attributes and topological unfairness due to attribute completion. FairAC can work on various types of homogeneous graphs and generate fair embeddings for them and thus can be applied to most downstream tasks to improve their fairness performance. To our best knowledge, FairAC is the first method that jointly addresses the graph attribution completion and graph unfairness problems. Experimental results on benchmark datasets show that our method achieves better fairness performance with less sacrifice in accuracy, compared with the state-of-the-art methods of fair graph learning. Code is available at: https://github.com/donglgcn/FairAC.  ( 2 min )
    From Chaos Comes Order: Ordering Event Representations for Object Recognition and Detection. (arXiv:2304.13455v4 [cs.CV] UPDATED)
    Today, state-of-the-art deep neural networks that process events first convert them into dense, grid-like input representations before using an off-the-shelf network. However, selecting the appropriate representation for the task traditionally requires training a neural network for each representation and selecting the best one based on the validation score, which is very time-consuming. This work eliminates this bottleneck by selecting representations based on the Gromov-Wasserstein Discrepancy (GWD) between raw events and their representation. It is about 200 times faster to compute than training a neural network and preserves the task performance ranking of event representations across multiple representations, network backbones, datasets, and tasks. Thus finding representations with high task scores is equivalent to finding representations with a low GWD. We use this insight to, for the first time, perform a hyperparameter search on a large family of event representations, revealing new and powerful representations that exceed the state-of-the-art. Our optimized representations outperform existing representations by 1.7 mAP on the 1 Mpx dataset and 0.3 mAP on the Gen1 dataset, two established object detection benchmarks, and reach a 3.8% higher classification score on the mini N-ImageNet benchmark. Moreover, we outperform state-of-the-art by 2.1 mAP on Gen1 and state-of-the-art feed-forward methods by 6.0 mAP on the 1 Mpx datasets. This work opens a new unexplored field of explicit representation optimization for event-based learning.  ( 3 min )
    Learning to Taste: A Multimodal Wine Dataset. (arXiv:2308.16900v1 [cs.LG])
    We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique vintages, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor annotations on a subset by conducting a wine-tasting experiment with 256 participants who were asked to rank wines based on their similarity in flavor, resulting in more than 5k pairwise flavor distances. We propose a low-dimensional concept embedding algorithm that combines human experience with automatic machine similarity kernels. We demonstrate that this shared concept embedding space improves upon separate embedding spaces for coarse flavor classification (alcohol percentage, country, grape, price, rating) and aligns with the intricate human perception of flavor.  ( 2 min )
    GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields. (arXiv:2308.16891v1 [cs.RO])
    It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot needs to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present $\textbf{GNFactor}$, a visual behavior cloning agent for multi-task robotic manipulation with $\textbf{G}$eneralizable $\textbf{N}$eural feature $\textbf{F}$ields. GNFactor jointly optimizes a generalizable neural field (GNF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model ($\textit{e.g.}$, Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Our project website is https://yanjieze.com/GNFactor/ .  ( 2 min )
    Hypergraph Structure Inference From Data Under Smoothness Prior. (arXiv:2308.14172v2 [cs.LG] UPDATED)
    Hypergraphs are important for processing data with higher-order relationships involving more than two entities. In scenarios where explicit hypergraphs are not readily available, it is desirable to infer a meaningful hypergraph structure from the node features to capture the intrinsic relations within the data. However, existing methods either adopt simple pre-defined rules that fail to precisely capture the distribution of the potential hypergraph structure, or learn a mapping between hypergraph structures and node features but require a large amount of labelled data, i.e., pre-existing hypergraph structures, for training. Both restrict their applications in practical scenarios. To fill this gap, we propose a novel smoothness prior that enables us to design a method to infer the probability for each potential hyperedge without labelled data as supervision. The proposed prior indicates features of nodes in a hyperedge are highly correlated by the features of the hyperedge containing them. We use this prior to derive the relation between the hypergraph structure and the node features via probabilistic modelling. This allows us to develop an unsupervised inference method to estimate the probability for each potential hyperedge via solving an optimisation problem that has an analytical solution. Experiments on both synthetic and real-world data demonstrate that our method can learn meaningful hypergraph structures from data more efficiently than existing hypergraph structure inference methods.
    StyleGAN as a Utility-Preserving Face De-identification Method. (arXiv:2212.02611v2 [cs.CV] UPDATED)
    Face de-identification methods have been proposed to preserve users' privacy by obscuring their faces. These methods, however, can degrade the quality of photos, and they usually do not preserve the utility of faces, i.e., their age, gender, pose, and facial expression. Recently, GANs, such as StyleGAN, have been proposed, which generate realistic, high-quality imaginary faces. In this paper, we investigate the use of StyleGAN in generating de-identified faces through style mixing. We examined this de-identification method for preserving utility and privacy by implementing several face detection, verification, and identification attacks and conducting a user study. The results from our extensive experiments, human evaluation, and comparison with two state-of-the-art methods, i.e., CIAGAN and DeepPrivacy, show that StyleGAN performs on par or better than these methods, preserving users' privacy and images' utility. In particular, the results of the machine learning-based experiments show that StyleGAN0-4 preserves utility better than CIAGAN and DeepPrivacy while preserving privacy at the same level. StyleGAN0-3 preserves utility at the same level while providing more privacy. In this paper, for the first time, we also performed a carefully designed user study to examine both privacy and utility-preserving properties of StyleGAN0-3, 0-4, and 0-5, as well as CIAGAN and DeepPrivacy from the human observers' perspectives. Our statistical tests showed that participants tend to verify and identify StyleGAN0-5 images more easily than DeepPrivacy images. All the methods but StyleGAN0-5 had significantly lower identification rates than CIAGAN. Regarding utility, as expected, StyleGAN0-5 performed significantly better in preserving some attributes. Among all methods, on average, participants believe gender has been preserved the most while naturalness has been preserved the least.
    Stochastic Configuration Machines for Industrial Artificial Intelligence. (arXiv:2308.13570v2 [cs.LG] UPDATED)
    Real-time predictive modelling with desired accuracy is highly expected in industrial artificial intelligence (IAI), where neural networks play a key role. Neural networks in IAI require powerful, high-performance computing devices to operate a large number of floating point data. Based on stochastic configuration networks (SCNs), this paper proposes a new randomized learner model, termed stochastic configuration machines (SCMs), to stress effective modelling and data size saving that are useful and valuable for industrial applications. Compared to SCNs and random vector functional-link (RVFL) nets with binarized implementation, the model storage of SCMs can be significantly compressed while retaining favourable prediction performance. Besides the architecture of the SCM learner model and its learning algorithm, as an important part of this contribution, we also provide a theoretical basis on the learning capacity of SCMs by analysing the model's complexity. Experimental studies are carried out over some benchmark datasets and three industrial applications. The results demonstrate that SCM has great potential for dealing with industrial data analytics.
    RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition. (arXiv:2308.11029v2 [cs.AI] UPDATED)
    Emotion recognition in conversation (ERC) has received increasing attention from researchers due to its wide range of applications.As conversation has a natural graph structure,numerous approaches used to model ERC based on graph convolutional networks (GCNs) have yielded significant results.However,the aggregation approach of traditional GCNs suffers from the node information redundancy problem,leading to node discriminant information loss.Additionally,single-layer GCNs lack the capacity to capture long-range contextual information from the graph. Furthermore,the majority of approaches are based on textual modality or stitching together different modalities, resulting in a weak ability to capture interactions between modalities. To address these problems, we present the relational bilevel aggregation graph convolutional network (RBA-GCN), which consists of three modules: the graph generation module (GGM), similarity-based cluster building module (SCBM) and bilevel aggregation module (BiAM). First, GGM constructs a novel graph to reduce the redundancy of target node information.Then,SCBM calculates the node similarity in the target node and its structural neighborhood, where noisy information with low similarity is filtered out to preserve the discriminant information of the node. Meanwhile, BiAM is a novel aggregation method that can preserve the information of nodes during the aggregation process. This module can construct the interaction between different modalities and capture long-range contextual information based on similarity clusters. On both the IEMOCAP and MELD datasets, the weighted average F1 score of RBA-GCN has a 2.17$\sim$5.21\% improvement over that of the most advanced method.Our code is available at https://github.com/luftmenscher/RBA-GCN and our article with the same name has been published in IEEE/ACM Transactions on Audio,Speech,and Language Processing,vol.31,2023
    xxMD: Benchmarking Neural Force Fields Using Extended Dynamics beyond Equilibrium. (arXiv:2308.11155v2 [cs.LG] UPDATED)
    Neural force fields (NFFs) have gained prominence in computational chemistry as surrogate models, superseding quantum-chemistry calculations in ab initio molecular dynamics. The prevalent benchmark for NFFs has been the MD17 dataset and its subsequent extension. These datasets predominantly comprise geometries from the equilibrium region of the ground electronic state potential energy surface, sampling from direct adiabatic dynamics. However, many chemical reactions entail significant molecular deformations, notably bond breaking. We demonstrate the constrained distribution of internal coordinates and energies in the MD17 datasets, underscoring their inadequacy for representing systems undergoing chemical reactions. Addressing this sampling limitation, we introduce the xxMD (Extended Excited-state Molecular Dynamics) dataset, derived from non-adiabatic dynamics. This dataset encompasses energies and forces ascertained from both multireference wave function theory and density functional theory. Furthermore, its nuclear configuration spaces authentically depict chemical reactions, making xxMD a more chemically relevant dataset. Our re-assessment of equivariant models on the xxMD datasets reveals notably higher mean absolute errors than those reported for MD17 and its variants. This observation underscores the challenges faced in crafting a generalizable NFF model with extrapolation capability. Our proposed xxMD-CASSCF and xxMD-DFT datasets are available at https://github.com/zpengmei/xxMD.
    Expressive Text-to-Image Generation with Rich Text. (arXiv:2304.06720v2 [cs.CV] UPDATED)
    Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on attention maps of a diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance, and maintain its fidelity against plain-text generation through region-based injections. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.
    A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks. (arXiv:2304.14994v2 [cs.LG] UPDATED)
    Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic forgetting impairs the applicability of this approach to initial value problems (IVPs). In an alternative local-in-time approach, the optimization problem can be converted into an ordinary differential equation (ODE) on the network parameters and the solution propagated forward in time; however, we demonstrate that current methods based on this approach suffer from two key issues. First, following the ODE produces an uncontrolled growth in the conditioning of the problem, ultimately leading to unacceptably large numerical errors. Second, as the ODE methods scale cubically with the number of model parameters, they are restricted to small neural networks, significantly limiting their ability to represent intricate PDE initial conditions and solutions. Building on these insights, we develop Neural IVP, an ODE based IVP solver which prevents the network from getting ill-conditioned and runs in time linear in the number of parameters, enabling us to evolve the dynamics of challenging PDEs with neural networks.
    Symmetry-Preserving Program Representations for Learning Code Semantics. (arXiv:2308.03312v5 [cs.LG] UPDATED)
    Large Language Models (LLMs) have shown promise in automated program reasoning, a crucial aspect of many security tasks. However, existing LLM architectures for code are often borrowed from other domains like natural language processing, raising concerns about their generalization and robustness to unseen code. A key generalization challenge is to incorporate the knowledge of code semantics, including control and data flow, into the LLM architectures. Drawing inspiration from examples of convolution layers exploiting translation symmetry, we explore how code symmetries can enhance LLM architectures for program analysis and modeling. We present a rigorous group-theoretic framework that formally defines code symmetries as semantics-preserving transformations and provides techniques for precisely reasoning about symmetry preservation within LLM architectures. Using this framework, we introduce a novel variant of self-attention that preserves program symmetries, demonstrating its effectiveness in generalization and robustness through detailed experimental evaluations across different binary and source code analysis tasks. Overall, our code symmetry framework offers rigorous and powerful reasoning techniques that can guide the future development of specialized LLMs for code and advance LLM-guided program reasoning tasks.
    Adaptive Uncertainty-Guided Model Selection for Data-Driven PDE Discovery. (arXiv:2308.10283v2 [cs.LG] UPDATED)
    We propose a new parameter-adaptive uncertainty-penalized Bayesian information criterion (UBIC) to prioritize the parsimonious partial differential equation (PDE) that sufficiently governs noisy spatial-temporal observed data with few reliable terms. Since the naive use of the BIC for model selection has been known to yield an undesirable overfitted PDE, the UBIC penalizes the found PDE not only by its complexity but also the quantified uncertainty, derived from the model supports' coefficient of variation in a probabilistic view. We also introduce physics-informed neural network learning as a simulation-based approach to further validate the selected PDE flexibly against the other discovered PDE. Numerical results affirm the successful application of the UBIC in identifying the true governing PDE. Additionally, we reveal an interesting effect of denoising the observed data on improving the trade-off between the BIC score and model complexity. Code is available at https://github.com/Pongpisit-Thanasutives/UBIC.
    Pre-Training Representations of Binary Code Using Contrastive Learning. (arXiv:2210.05102v2 [cs.SE] UPDATED)
    Compiled software is delivered as executable binary code. Developers write source code to express the software semantics, but the compiler converts it to a binary format that the CPU can directly execute. Therefore, binary code analysis is critical to applications in reverse engineering and computer security tasks where source code is not available. However, unlike source code and natural language that contain rich semantic information, binary code is typically difficult for human engineers to understand and analyze. While existing work uses AI models to assist source code analysis, few studies have considered binary code. In this paper, we propose a COntrastive learning Model for Binary cOde Analysis, or COMBO, that incorporates source code and comment information into binary code during representation learning. Specifically, we present three components in COMBO: (1) a primary contrastive learning method for cold-start pre-training, (2) a simplex interpolation method to incorporate source code, comments, and binary code, and (3) an intermediate representation learning algorithm to provide binary code embeddings. Finally, we evaluate the effectiveness of the pre-trained representations produced by COMBO using three indicative downstream tasks relating to binary code: algorithmic functionality classification, binary code similarity, and vulnerability detection. Our experimental results show that COMBO facilitates representation learning of binary code visualized by distribution analysis, and improves the performance on all three downstream tasks by 5.45% on average compared to state-of-the-art large-scale language representation models. To the best of our knowledge, COMBO is the first language representation model that incorporates source code, binary code, and comments into contrastive code representation learning and unifies multiple tasks for binary code analysis.
    Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning. (arXiv:2303.08566v2 [cs.CV] UPDATED)
    Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty. However, existing PEFT methods introduce trainable parameters to the same positions across different tasks depending solely on human heuristics and neglect the domain gaps. To this end, we study where to introduce and how to allocate trainable parameters by proposing a novel Sensitivity-aware visual Parameter-efficient fine-Tuning (SPT) scheme, which adaptively allocates trainable parameters to task-specific important positions given a desired tunable parameter budget. Specifically, our SPT first quickly identifies the sensitive parameters that require tuning for a given task in a data-dependent way. Next, our SPT further boosts the representational capability for the weight matrices whose number of sensitive parameters exceeds a pre-defined threshold by utilizing existing structured tuning methods, e.g., LoRA [23] or Adapter [22], to replace directly tuning the selected sensitive parameters (unstructured tuning) under the budget. Extensive experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods and largely boosts their performance, e.g., SPT improves Adapter with supervised pre-trained ViT-B/16 backbone by 4.2% and 1.4% mean Top-1 accuracy, reaching SOTA performance on FGVC and VTAB-1k benchmarks, respectively. Source code is at https://github.com/ziplab/SPT
    Quantization-based Optimization with Perspective of Quantum Mechanics. (arXiv:2308.11594v2 [quant-ph] UPDATED)
    Statistical and stochastic analysis based on thermodynamics has been the main analysis framework for stochastic global optimization. Recently, appearing quantum annealing or quantum tunneling algorithm for global optimization, we require a new researching framework for global optimization algorithms. In this paper, we provide the analysis for quantization-based optimization based on the Schr\"odinger equation to reveal what property in quantum mechanics enables global optimization. We present that the tunneling effect derived by the Schr\"odinger equation in quantization-based optimization enables to escape of a local minimum. Additionally, we confirm that this tunneling effect is the same property included in quantum mechanics-based global optimization. Experiments with standard multi-modal benchmark functions represent that the proposed analysis is valid.
    Flexible Phase Dynamics for Bio-Plausible Contrastive Learning. (arXiv:2302.12431v2 [cs.LG] UPDATED)
    Many learning algorithms used as normative models in neuroscience or as candidate approaches for learning on neuromorphic chips learn by contrasting one set of network states with another. These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics that could limit the range of physical systems capable of harnessing CL. In this study, we build on recent work exploring how CL might be implemented by biological or neurmorphic systems and show that this form of learning can be made temporally local, and can still function even if many of the dynamical requirements of standard training procedures are relaxed. Thanks to a set of general theorems corroborated by numerical experiments across several CL models, our results provide theoretical foundations for the study and development of CL methods for biological and neuromorphic neural networks.
    Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities. (arXiv:2302.08761v3 [cs.LG] UPDATED)
    Traffic analysis is crucial for urban operations and planning, while the availability of dense urban traffic data beyond loop detectors is still scarce. We present a large-scale floating vehicle dataset of per-street segment traffic information, Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities (MeTS-10), available for 10 global cities with a 15-minute resolution for collection periods ranging between 108 and 361 days in 2019-2021 and covering more than 1500 square kilometers per metropolitan area. MeTS-10 features traffic speed information at all street levels from main arterials to local streets for Antwerp, Bangkok, Barcelona, Berlin, Chicago, Istanbul, London, Madrid, Melbourne and Moscow. The dataset leverages the industrial-scale floating vehicle Traffic4cast data with speeds and vehicle counts provided in a privacy-preserving spatio-temporal aggregation. We detail the efficient matching approach mapping the data to the OpenStreetMap road graph. We evaluate the dataset by comparing it with publicly available stationary vehicle detector data (for Berlin, London, and Madrid) and the Uber traffic speed dataset (for Barcelona, Berlin, and London). The comparison highlights the differences across datasets in spatio-temporal coverage and variations in the reported traffic caused by the binning method. MeTS-10 enables novel, city-wide analysis of mobility and traffic patterns for ten major world cities, overcoming current limitations of spatially sparse vehicle detector data. The large spatial and temporal coverage offers an opportunity for joining the MeTS-10 with other datasets, such as traffic surveys in traffic planning studies or vehicle detector data in traffic control settings.
    When Deep Learning Meets Polyhedral Theory: A Survey. (arXiv:2305.00241v2 [math.OC] UPDATED)
    In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU), which became the most commonly used type of activation function in neural networks. That made certain types of network structure $\unicode{x2014}$such as the typical fully-connected feedforward neural network$\unicode{x2014}$ amenable to analysis through polyhedral theory and to the application of methodologies such as Linear Programming (LP) and Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this paper, we survey the main topics emerging from this fast-paced area of work, which bring a fresh perspective to understanding neural networks in more detail as well as to applying linear optimization techniques to train, verify, and reduce the size of such networks.
    Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization. (arXiv:2210.09134v2 [cs.LG] UPDATED)
    Bayesian model reduction provides an efficient approach for comparing the performance of all nested sub-models of a model, without re-evaluating any of these sub-models. Until now, Bayesian model reduction has been applied mainly in the computational neuroscience community on simple models. In this paper, we formulate and apply Bayesian model reduction to perform principled pruning of Bayesian neural networks, based on variational free energy minimization. Direct application of Bayesian model reduction, however, gives rise to approximation errors. Therefore, a novel iterative pruning algorithm is presented to alleviate the problems arising with naive Bayesian model reduction, as supported experimentally on the publicly available UCI datasets for different inference algorithms. This novel parameter pruning scheme solves the shortcomings of current state-of-the-art pruning methods that are used by the signal processing community. The proposed approach has a clear stopping criterion and minimizes the same objective that is used during training. Next to these benefits, our experiments indicate better model performance in comparison to state-of-the-art pruning schemes.
    StyleDiff: Attribute Comparison Between Unlabeled Datasets in Latent Disentangled Space. (arXiv:2303.05102v2 [stat.ML] UPDATED)
    One major challenge in machine learning applications is coping with mismatches between the datasets used in the development and those obtained in real-world applications. These mismatches may lead to inaccurate predictions and errors, resulting in poor product quality and unreliable systems. In this study, we propose StyleDiff to inform developers of the differences between the two datasets for the steady development of machine learning systems. Using disentangled image spaces obtained from recently proposed generative models, StyleDiff compares the two datasets by focusing on attributes in the images and provides an easy-to-understand analysis of the differences between the datasets. The proposed StyleDiff performs in $O (d N\log N)$, where $N$ is the size of the datasets and $d$ is the number of attributes, enabling the application to large datasets. We demonstrate that StyleDiff accurately detects differences between datasets and presents them in an understandable format using, for example, driving scenes datasets.
    DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion. (arXiv:2303.12743v4 [cs.CV] UPDATED)
    In autonomous driving, data augmentation is commonly used for improving 3D object detection. The most basic methods include insertion of copied objects and rotation and scaling of the entire training frame. Numerous variants have been developed as well. The existing methods, however, are considerably limited when compared to the variety of the real world possibilities. In this work, we develop a diversified and realistic augmentation method that can flexibly construct a whole-body object, freely locate and rotate the object, and apply self-occlusion and external-occlusion accordingly. To improve the diversity of the whole-body object construction, we develop an iterative method that stochastically combines multiple objects observed from the real world into a single object. Unlike the existing augmentation methods, the constructed objects can be randomly located and rotated in the training frame because proper occlusions can be reflected to the whole-body objects in the final step. Finally, proper self-occlusion at each local object level and external-occlusion at the global frame level are applied using the Hidden Point Removal (HPR) algorithm that is computationally efficient. HPR is also used for adaptively controlling the point density of each object according to the object's distance from the LiDAR. Experiment results show that the proposed DR.CPO algorithm is data-efficient and model-agnostic without incurring any computational overhead. Also, DR.CPO can improve mAP performance by 2.08% when compared to the best 3D detection result known for KITTI dataset. The code is available at https://github.com/SNU-DRL/DRCPO.git
    Learning Melanocytic Cell Masks from Adjacent Stained Tissue. (arXiv:2211.00646v3 [q-bio.QM] UPDATED)
    Melanoma is one of the most aggressive forms of skin cancer, causing a large proportion of skin cancer deaths. However, melanoma diagnoses by pathologists shows low interrater reliability. As melanoma is a cancer of the melanocyte, there is a clear need to develop a melanocytic cell segmentation tool that is agnostic to pathologist variability and automates pixel-level annotation. Gigapixel-level pathologist labeling, however, is impractical. Herein, we propose a means to train deep neural networks for melanocytic cell segmentation from hematoxylin and eosin (H&E) stained sections and paired immunohistochemistry (IHC) of adjacent tissue sections, achieving a mean IOU of 0.64 despite imperfect ground-truth labels.
    On-Demand Communication for Asynchronous Multi-Agent Bandits. (arXiv:2302.07446v2 [cs.LG] UPDATED)
    This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate asynchronously -- agent pull times and rates are unknown, irregular, and heterogeneous -- and face the same instance of a K-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose ODC, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. ODC is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate ODC into the natural extensions of UCB and AAE algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.
    0/1 Deep Neural Networks via Block Coordinate Descent. (arXiv:2206.09379v2 [cs.LG] UPDATED)
    The step function is one of the simplest and most natural activation functions for deep neural networks (DNNs). As it counts 1 for positive variables and 0 for others, its intrinsic characteristics (e.g., discontinuity and no viable information of subgradients) impede its development for several decades. Even if there is an impressive body of work on designing DNNs with continuous activation functions that can be deemed as surrogates of the step function, it is still in the possession of some advantageous properties, such as complete robustness to outliers and being capable of attaining the best learning-theoretic guarantee of predictive accuracy. Hence, in this paper, we aim to train DNNs with the step function used as an activation function (dubbed as 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimization problem and then solve it by a block coordinate descend (BCD) method. Moreover, we acquire closed-form solutions for sub-problems of BCD as well as its convergence properties. Furthermore, we also integrate $\ell_{2,0}$-regularization into 0/1 DNN to accelerate the training process and compress the network scale. As a result, the proposed algorithm has a high performance on classifying MNIST and Fashion-MNIST datasets. As a result, the proposed algorithm has a desirable performance on classifying MNIST, FashionMNIST, Cifar10, and Cifar100 datasets.
    Extending regionalization algorithms to explore spatial process heterogeneity. (arXiv:2206.09429v4 [stat.ME] UPDATED)
    In spatial regression models, spatial heterogeneity may be considered with either continuous or discrete specifications. The latter is related to delineation of spatially connected regions with homogeneous relationships between variables (spatial regimes). Although various regionalization algorithms have been proposed and studied in the field of spatial analytics, methods to optimize spatial regimes have been largely unexplored. In this paper, we propose two new algorithms for spatial regime delineation, two-stage K-Models and Regional-K-Models. We also extend the classic Automatic Zoning Procedure to spatial regression context. The proposed algorithms are applied to a series of synthetic datasets and two real-world datasets. Results indicate that all three algorithms achieve superior or comparable performance to existing approaches, while the two-stage K-Models algorithm largely outperforms existing approaches on model fitting, region reconstruction, and coefficient estimation. Our work enriches the spatial analytics toolbox to explore spatial heterogeneous processes.
    Hypernetwork approach to Bayesian MAML. (arXiv:2210.02796v2 [cs.LG] UPDATED)
    The main goal of Few-Shot learning algorithms is to enable learning from small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the shared universal weights of a meta-model, which are then adapted for specific tasks. However, the method suffers from over-fitting and poorly quantifies uncertainty due to limited data size. Bayesian approaches could, in principle, alleviate these shortcomings by learning weight distributions in place of point-wise weights. Unfortunately, previous modifications of MAML are limited due to the simplicity of Gaussian posteriors, MAML-like gradient-based weight updates, or by the same structure enforced for universal and adapted weights. In this paper, we propose a novel framework for Bayesian MAML called BayesianHMAML, which employs Hypernetworks for weight updates. It learns the universal weights point-wise, but a probabilistic structure is added when adapted for specific tasks. In such a framework, we can use simple Gaussian distributions or more complicated posteriors induced by Continuous Normalizing Flows.
    Federated Adaptive Prompt Tuning for Multi-domain Collaborative Learning. (arXiv:2211.07864v2 [cs.LG] UPDATED)
    Federated learning (FL) enables multiple clients to collaboratively train a global model without disclosing their data. Previous researches often require training the complete model parameters. However, the emergence of powerful pre-trained models makes it possible to achieve higher performance with fewer learnable parameters in FL. In this paper, we propose a federated adaptive prompt tuning algorithm, FedAPT, for multi-domain collaborative image classification with powerful foundation models, like CLIP. Compared with direct federated prompt tuning, our core idea is to adaptively unlock specific domain knowledge for each test sample in order to provide them with personalized prompts. To implement this idea, we design an adaptive prompt tuning module, which consists of a meta prompt, an adaptive network, and some keys. The server randomly generates a set of keys and assigns a unique key to each client. Then all clients cooperatively train the global adaptive network and meta prompt with the local datasets and the frozen keys. Ultimately, the global aggregation model can assign a personalized prompt to CLIP based on the domain features of each test sample. We perform extensive experiments on two multi-domain image classification datasets across two different settings - supervised and unsupervised. The results show that FedAPT can achieve better performance with less than 10\% of the number of parameters of the fully trained model, and the global model can perform well in diverse client domains simultaneously.
    Point Cloud-based Proactive Link Quality Prediction for Millimeter-wave Communications. (arXiv:2301.00752v3 [cs.NI] UPDATED)
    This study demonstrates the feasibility of point cloud-based proactive link quality prediction for millimeter-wave (mmWave) communications. Previous studies have proposed machine learning-based methods to predict received signal strength for future time periods using time series of depth images to mitigate the line-of-sight (LOS) path blockage by pedestrians in mmWave communication. However, these image-based methods have limited applicability due to privacy concerns as camera images may contain sensitive information. This study proposes a point cloud-based method for mmWave link quality prediction and demonstrates its feasibility through experiments. Point clouds represent three-dimensional (3D) spaces as a set of points and are sparser and less likely to contain sensitive information than camera images. Additionally, point clouds provide 3D position and motion information, which is necessary for understanding the radio propagation environment involving pedestrians. This study designs the mmWave link quality prediction method and conducts realistic indoor experiments, where the link quality fluctuates significantly due to human blockage, using commercially available IEEE 802.11ad-based 60 GHz wireless LAN devices and Kinect v2 RGB-D camera and Velodyne VLP-16 light detection and ranging (LiDAR) for point cloud acquisition. The experimental results showed that our proposed method can predict future large attenuation of mmWave received signal strength and throughput induced by the LOS path blockage by pedestrians with comparable or superior accuracy to image-based prediction methods. Hence, our point cloud-based method can serve as a viable alternative to image-based methods.
    Sequential Informed Federated Unlearning: Efficient and Provable Client Unlearning in Federated Optimization. (arXiv:2211.11656v4 [cs.LG] UPDATED)
    The aim of Machine Unlearning (MU) is to provide theoretical guarantees on the removal of the contribution of a given data point from a training procedure. Federated Unlearning (FU) consists in extending MU to unlearn a given client's contribution from a federated training routine. Current FU approaches are generally not scalable, and do not come with sound theoretical quantification of the effectiveness of unlearning. In this work we present Informed Federated Unlearning (IFU), a novel efficient and quantifiable FU approach. Upon unlearning request from a given client, IFU identifies the optimal FL iteration from which FL has to be reinitialized, with unlearning guarantees obtained through a randomized perturbation mechanism. The theory of IFU is also extended to account for sequential unlearning requests. Experimental results on different tasks and dataset show that IFU leads to more efficient unlearning procedures as compared to basic re-training and state-of-the-art FU approaches.
    Simulation-Based Optimization of User Interfaces for Quality-Assuring Machine Learning Model Predictions. (arXiv:2104.01129v2 [cs.HC] UPDATED)
    Quality-sensitive applications of machine learning (ML) require quality assurance (QA) by humans before the predictions of an ML model can be deployed. QA for ML (QA4ML) interfaces require users to view a large amount of data and perform many interactions to correct errors made by the ML model. An optimized user interface (UI) can significantly reduce interaction costs. While UI optimization can be informed by user studies evaluating design options, this approach is not scalable because there are typically numerous small variations that can affect the efficiency of a QA4ML interface. Hence, we propose using simulation to evaluate and aid the optimization of QA4ML interfaces. In particular, we focus on simulating the combined effects of human intelligence in initiating appropriate interaction commands and machine intelligence in providing algorithmic assistance for accelerating QA4ML processes. As QA4ML is usually labor-intensive, we use the simulated task completion time as the metric for UI optimization under different interface and algorithm setups. We demonstrate the usage of this UI design method in several QA4ML applications.
    Combining Inductive and Deductive Reasoning for Query Answering over Incomplete Knowledge Graphs. (arXiv:2106.14052v2 [cs.AI] UPDATED)
    Current methods for embedding-based query answering over incomplete Knowledge Graphs (KGs) only focus on inductive reasoning, i.e., predicting answers by learning patterns from the data, and lack the complementary ability to do deductive reasoning, which requires the application of domain knowledge to infer further information. To address this shortcoming, we investigate the problem of incorporating ontologies into embedding-based query answering models by defining the task of embedding-based ontology-mediated query answering. We propose various integration strategies into prominent representatives of embedding models that involve (1) different ontology-driven data augmentation techniques and (2) adaptation of the loss function to enforce the ontology axioms. We design novel benchmarks for the considered task based on the LUBM and the NELL KGs and evaluate our methods on them. The achieved improvements in the setting that requires both inductive and deductive reasoning are from 20% to 55% in HITS@3.
    Natural Quantum Monte Carlo Computation of Excited States. (arXiv:2308.16848v1 [physics.comp-ph])
    We present a variational Monte Carlo algorithm for estimating the lowest excited states of a quantum system which is a natural generalization of the estimation of ground states. The method has no free parameters and requires no explicit orthogonalization of the different states, instead transforming the problem of finding excited states of a given system into that of finding the ground state of an expanded system. Expected values of arbitrary observables can be calculated, including off-diagonal expectations between different states such as the transition dipole moment. Although the method is entirely general, it works particularly well in conjunction with recent work on using neural networks as variational Ansatze for many-electron systems, and we show that by combining this method with the FermiNet and Psiformer Ansatze we can accurately recover vertical excitation energies and oscillator strengths on molecules as large as benzene. Beyond the examples on molecules presented here, we expect this technique will be of great interest for applications of variational quantum Monte Carlo to atomic, nuclear and condensed matter physics.
    Visual correspondence-based explanations improve AI robustness and human-AI team accuracy. (arXiv:2208.00780v5 [cs.CV] UPDATED)
    Explaining artificial intelligence (AI) predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate decision-makers. In this work, we propose two novel architectures of self-interpretable image classifiers that first explain, and then predict (as opposed to post-hoc explanations) by harnessing the visual correspondences between a query image and exemplars. Our models consistently improve (by 1 to 4 points) on out-of-distribution (OOD) datasets while performing marginally worse (by 1 to 2 points) on in-distribution tests than ResNet-50 and a $k$-nearest neighbor classifier (kNN). Via a large-scale, human study on ImageNet and CUB, our correspondence-based explanations are found to be more useful to users than kNN explanations. Our explanations help users more accurately reject AI's wrong decisions than all other tested methods. Interestingly, for the first time, we show that it is possible to achieve complementary human-AI team accuracy (i.e., that is higher than either AI-alone or human-alone), in ImageNet and CUB image classification tasks.
    PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction. (arXiv:2308.16896v1 [cs.CV])
    Semantic segmentation in autonomous driving has been undergoing an evolution from sparse point segmentation to dense voxel segmentation, where the objective is to predict the semantic occupancy of each voxel in the concerned 3D space. The dense nature of the prediction space has rendered existing efficient 2D-projection-based methods (e.g., bird's eye view, range view, etc.) ineffective, as they can only describe a subspace of the 3D scene. To address this, we propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively and a PointOcc model to process them efficiently. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system for more fine-grained modeling of nearer areas. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane. Finally, we obtain the features of each point by aggregating its projected features on each of the processed TPV planes without the need for any post-processing. Extensive experiments on both 3D occupancy prediction and LiDAR segmentation benchmarks demonstrate that the proposed PointOcc achieves state-of-the-art performance with much faster speed. Specifically, despite only using LiDAR, PointOcc significantly outperforms all other methods, including multi-modal methods, with a large margin on the OpenOccupancy benchmark. Code: https://github.com/wzzheng/PointOcc.
    FedDD: Toward Communication-efficient Federated Learning with Differential Parameter Dropout. (arXiv:2308.16835v1 [cs.LG])
    Federated Learning (FL) requires frequent exchange of model parameters, which leads to long communication delay, especially when the network environments of clients vary greatly. Moreover, the parameter server needs to wait for the slowest client (i.e., straggler, which may have the largest model size, lowest computing capability or worst network condition) to upload parameters, which may significantly degrade the communication efficiency. Commonly-used client selection methods such as partial client selection would lead to the waste of computing resources and weaken the generalization of the global model. To tackle this problem, along a different line, in this paper, we advocate the approach of model parameter dropout instead of client selection, and accordingly propose a novel framework of Federated learning scheme with Differential parameter Dropout (FedDD). FedDD consists of two key modules: dropout rate allocation and uploaded parameter selection, which will optimize the model parameter uploading ratios tailored to different clients' heterogeneous conditions and also select the proper set of important model parameters for uploading subject to clients' dropout rate constraints. Specifically, the dropout rate allocation is formulated as a convex optimization problem, taking system heterogeneity, data heterogeneity, and model heterogeneity among clients into consideration. The uploaded parameter selection strategy prioritizes on eliciting important parameters for uploading to speedup convergence. Furthermore, we theoretically analyze the convergence of the proposed FedDD scheme. Extensive performance evaluations demonstrate that the proposed FedDD scheme can achieve outstanding performances in both communication efficiency and model convergence, and also possesses a strong generalization capability to data of rare classes.
    Learning Optimal Strategies for Temporal Tasks in Stochastic Games. (arXiv:2102.04307v3 [cs.AI] UPDATED)
    Synthesis from linear temporal logic (LTL) specifications provides assured controllers for systems operating in stochastic and potentially adversarial environments. Automatic synthesis tools, however, require a model of the environment to construct controllers. In this work, we introduce a model-free reinforcement learning (RL) approach to derive controllers from given LTL specifications even when the environment is completely unknown. We model the problem as a stochastic game (SG) between the controller and the adversarial environment; we then learn optimal control strategies that maximize the probability of satisfying the LTL specifications against the worst-case environment behavior. We first construct a product game using the deterministic parity automaton (DPA) translated from the given LTL specification. By deriving distinct rewards and discount factors from the acceptance condition of the DPA, we reduce the maximization of the worst-case probability of satisfying the LTL specification into the maximization of a discounted reward objective in the product game; this enables the use of model-free RL algorithms to learn an optimal controller strategy. To deal with the common scalability problems when the number of sets defining the acceptance condition of the DPA (usually referred as colors), is large, we propose a lazy color generation method where distinct rewards and discount factors are utilized only when needed, and an approximate method where the controller eventually focuses on only one color. In several case studies, we show that our approach is scalable to a wide range of LTL formulas, significantly outperforming existing methods for learning controllers from LTL specifications in SGs.
    Leveraging Image-based Generative Adversarial Networks for Time Series Generation. (arXiv:2112.08060v2 [cs.LG] UPDATED)
    Generative models for images have gained significant attention in computer vision and natural language processing due to their ability to generate realistic samples from complex data distributions. To leverage the advances of image-based generative models for the time series domain, we propose a two-dimensional image representation for time series, the Extended Intertemporal Return Plot (XIRP). Our approach captures the intertemporal time series dynamics in a scale-invariant and invertible way, reducing training time and improving sample quality. We benchmark synthetic XIRPs obtained by an off-the-shelf Wasserstein GAN with gradient penalty (WGAN-GP) to other image representations and models regarding similarity and predictive ability metrics. Our novel, validated image representation for time series consistently and significantly outperforms a state-of-the-art RNN-based generative model regarding predictive ability. Further, we introduce an improved stochastic inversion to substantially improve simulation quality regardless of the representation and provide the prospect of transfer potentials in other domains.
    Dynamical systems' based neural networks. (arXiv:2210.02373v2 [cs.LG] UPDATED)
    Neural networks have gained much interest because of their effectiveness in many applications. However, their mathematical properties are generally not well understood. If there is some underlying geometric structure inherent to the data or to the function to approximate, it is often desirable to take this into account in the design of the neural network. In this work, we start with a non-autonomous ODE and build neural networks using a suitable, structure-preserving, numerical time-discretisation. The structure of the neural network is then inferred from the properties of the ODE vector field. Besides injecting more structure into the network architectures, this modelling procedure allows a better theoretical understanding of their behaviour. We present two universal approximation results and demonstrate how to impose some particular properties on the neural networks. A particular focus is on 1-Lipschitz architectures including layers that are not 1-Lipschitz. These networks are expressive and robust against adversarial attacks, as shown for the CIFAR-10 and CIFAR-100 datasets.
    Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics. (arXiv:2308.16680v1 [stat.ML])
    We propose to apply several gradient estimation techniques to enable the differentiation of programs with discrete randomness in High Energy Physics. Such programs are common in High Energy Physics due to the presence of branching processes and clustering-based analysis. Thus differentiating such programs can open the way for gradient based optimization in the context of detector design optimization, simulator tuning, or data analysis and reconstruction optimization. We discuss several possible gradient estimation strategies, including the recent Stochastic AD method, and compare them in simplified detector design experiments. In doing so we develop, to the best of our knowledge, the first fully differentiable branching program.
    Latent Variable Multi-output Gaussian Processes for Hierarchical Datasets. (arXiv:2308.16822v1 [cs.LG])
    Multi-output Gaussian processes (MOGPs) have been introduced to deal with multiple tasks by exploiting the correlations between different outputs. Generally, MOGPs models assume a flat correlation structure between the outputs. However, such a formulation does not account for more elaborate relationships, for instance, if several replicates were observed for each output (which is a typical setting in biological experiments). This paper proposes an extension of MOGPs for hierarchical datasets (i.e. datasets for which the relationships between observations can be represented within a tree structure). Our model defines a tailored kernel function accounting for hierarchical structures in the data to capture different levels of correlations while leveraging the introduction of latent variables to express the underlying dependencies between outputs through a dedicated kernel. This latter feature is expected to significantly improve scalability as the number of tasks increases. An extensive experimental study involving both synthetic and real-world data from genomics and motion capture is proposed to support our claims.
    Autoencoder-based Online Data Quality Monitoring for the CMS Electromagnetic Calorimeter. (arXiv:2308.16659v1 [physics.ins-det])
    The online Data Quality Monitoring system (DQM) of the CMS electromagnetic calorimeter (ECAL) is a crucial operational tool that allows ECAL experts to quickly identify, localize, and diagnose a broad range of detector issues that would otherwise hinder physics-quality data taking. Although the existing ECAL DQM system has been continuously updated to respond to new problems, it remains one step behind newer and unforeseen issues. Using unsupervised deep learning, a real-time autoencoder-based anomaly detection system is developed that is able to detect ECAL anomalies unseen in past data. After accounting for spatial variations in the response of the ECAL and the temporal evolution of anomalies, the new system is able to efficiently detect anomalies while maintaining an estimated false discovery rate between $10^{-2}$ to $10^{-4}$, beating existing benchmarks by about two orders of magnitude. The real-world performance of the system is validated using anomalies found in 2018 and 2022 LHC collision data. Additionally, first results from deploying the autoencoder-based system in the CMS online DQM workflow for the ECAL barrel during Run 3 of the LHC are presented, showing its promising performance in detecting obscure issues that could have been missed in the existing DQM system.
    Prediction of Diblock Copolymer Morphology via Machine Learning. (arXiv:2308.16886v1 [physics.chem-ph])
    A machine learning approach is presented to accelerate the computation of block polymer morphology evolution for large domains over long timescales. The strategy exploits the separation of characteristic times between coarse-grained particle evolution on the monomer scale and slow morphological evolution over mesoscopic scales. In contrast to empirical continuum models, the proposed approach learns stochastically driven defect annihilation processes directly from particle-based simulations. A UNet architecture that respects different boundary conditions is adopted, thereby allowing periodic and fixed substrate boundary conditions of arbitrary shape. Physical concepts are also introduced via the loss function and symmetries are incorporated via data augmentation. The model is validated using three different use cases. Explainable artificial intelligence methods are applied to visualize the morphology evolution over time. This approach enables the generation of large system sizes and long trajectories to investigate defect densities and their evolution under different types of confinement. As an application, we demonstrate the importance of accessing late-stage morphologies for understanding particle diffusion inside a single block. This work has implications for directed self-assembly and materials design in micro-electronics, battery materials, and membranes.
    Everything, Everywhere All in One Evaluation: Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness. (arXiv:2308.16681v1 [stat.ML])
    A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. When designed well, these systems promise more objective decisions while saving large amounts of resources and freeing up human time. However, when ADM systems are not designed well, they can lead to unfair decisions which discriminate against societal groups. The downstream effects of ADMs critically depend on the decisions made during the systems' design and implementation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these design decisions are made implicitly, without knowing exactly how they will influence the final system. It is therefore important to make explicit the decisions made during the design of ADM systems and understand how these decisions affect the fairness of the resulting system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit design decisions into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible "universes" of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand variability and robustness of algorithmic fairness using an exemplary case study of predicting public health coverage of vulnerable populations for potential interventions. Our results illustrate how decisions during the design of a machine learning system can have surprising effects on its fairness and how to detect these effects using multiverse analysis.
    Diffusion Models for Interferometric Satellite Aperture Radar. (arXiv:2308.16847v1 [cs.CV])
    Probabilistic Diffusion Models (PDMs) have recently emerged as a very promising class of generative models, achieving high performance in natural image generation. However, their performance relative to non-natural images, like radar-based satellite data, remains largely unknown. Generating large amounts of synthetic (and especially labelled) satellite data is crucial to implement deep-learning approaches for the processing and analysis of (interferometric) satellite aperture radar data. Here, we leverage PDMs to generate several radar-based satellite image datasets. We show that PDMs succeed in generating images with complex and realistic structures, but that sampling time remains an issue. Indeed, accelerated sampling strategies, which work well on simple image datasets like MNIST, fail on our radar datasets. We provide a simple and versatile open-source https://github.com/thomaskerdreux/PDM_SAR_InSAR_generation to train, sample and evaluate PDMs using any dataset on a single GPU.
    The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants. (arXiv:2308.16884v1 [cs.CL])
    We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multiple-choice answers. The questions were carefully curated to discriminate between models with different levels of general language comprehension. The English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs). We present extensive results and find that despite significant cross-lingual transfer in English-centric LLMs, much smaller MLMs pretrained on balanced multilingual data still understand far more languages. We also observe that larger vocabulary size and conscious vocabulary construction correlate with better performance on low-resource languages. Overall, Belebele opens up new avenues for evaluating and analyzing the multilingual capabilities of NLP systems.
    Transformers as Support Vector Machines. (arXiv:2308.16898v1 [cs.LG])
    Since its inception in "Attention Is All You Need", transformer architecture has led to revolutionary advancements in NLP. The attention layer within the transformer admits a sequence of input tokens $X$ and makes them interact through pairwise similarities computed as softmax$(XQK^\top X^\top)$, where $(K,Q)$ are the trainable key-query parameters. In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs. This formalism allows us to characterize the implicit bias of 1-layer transformers optimized with gradient descent: (1) Optimizing the attention layer with vanishing regularization, parameterized by $(K,Q)$, converges in direction to an SVM solution minimizing the nuclear norm of the combined parameter $W=KQ^\top$. Instead, directly parameterizing by $W$ minimizes a Frobenius norm objective. We characterize this convergence, highlighting that it can occur toward locally-optimal directions rather than global ones. (2) Complementing this, we prove the local/global directional convergence of gradient descent under suitable geometric conditions. Importantly, we show that over-parameterization catalyzes global convergence by ensuring the feasibility of the SVM problem and by guaranteeing a benign optimization landscape devoid of stationary points. (3) While our theory applies primarily to linear prediction heads, we propose a more general SVM equivalence that predicts the implicit bias with nonlinear heads. Our findings are applicable to arbitrary datasets and their validity is verified via experiments. We also introduce several open problems and research directions. We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
    Federated Learning in UAV-Enhanced Networks: Joint Coverage and Convergence Time Optimization. (arXiv:2308.16889v1 [cs.LG])
    Federated learning (FL) involves several devices that collaboratively train a shared model without transferring their local data. FL reduces the communication overhead, making it a promising learning method in UAV-enhanced wireless networks with scarce energy resources. Despite the potential, implementing FL in UAV-enhanced networks is challenging, as conventional UAV placement methods that maximize coverage increase the FL delay significantly. Moreover, the uncertainty and lack of a priori information about crucial variables, such as channel quality, exacerbate the problem. In this paper, we first analyze the statistical characteristics of a UAV-enhanced wireless sensor network (WSN) with energy harvesting. We then develop a model and solution based on the multi-objective multi-armed bandit theory to maximize the network coverage while minimizing the FL delay. Besides, we propose another solution that is particularly useful with large action sets and strict energy constraints at the UAVs. Our proposal uses a scalarized best-arm identification algorithm to find the optimal arms that maximize the ratio of the expected reward to the expected energy cost by sequentially eliminating one or more arms in each round. Then, we derive the upper bound on the error probability of our multi-objective and cost-aware algorithm. Numerical results show the effectiveness of our approach.
    Efficacy of Neural Prediction-Based NAS for Zero-Shot NAS Paradigm. (arXiv:2308.16775v1 [cs.LG])
    In prediction-based Neural Architecture Search (NAS), performance indicators derived from graph convolutional networks have shown significant success. These indicators, achieved by representing feed-forward structures as component graphs through one-hot encoding, face a limitation: their inability to evaluate architecture performance across varying search spaces. In contrast, handcrafted performance indicators (zero-shot NAS), which use the same architecture with random initialization, can generalize across multiple search spaces. Addressing this limitation, we propose a novel approach for zero-shot NAS using deep learning. Our method employs Fourier sum of sines encoding for convolutional kernels, enabling the construction of a computational feed-forward graph with a structure similar to the architecture under evaluation. These encodings are learnable and offer a comprehensive view of the architecture's topological information. An accompanying multi-layer perceptron (MLP) then ranks these architectures based on their encodings. Experimental results show that our approach surpasses previous methods using graph convolutional networks in terms of correlation on the NAS-Bench-201 dataset and exhibits a higher convergence rate. Moreover, our extracted feature representation trained on each NAS-Benchmark is transferable to other NAS-Benchmarks, showing promising generalizability across multiple search spaces. The code is available at: https://github.com/minh1409/DFT-NPZS-NAS
    Moreau Envelope ADMM for Decentralized Weakly Convex Optimization. (arXiv:2308.16752v1 [math.OC])
    This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization. Although the current versions of ADMM algorithm provide promising numerical results in producing solutions that are close to optimal for many convex and non-convex optimization problems, it remains unclear if they can converge to a stationary point for weakly convex and locally non-smooth functions. Through our analysis using the Moreau envelope function, we demonstrate that MADM can indeed converge to a stationary point under mild conditions. Our analysis also includes computing the bounds on the amount of change in the dual variable update step by relating the gradient of the Moreau envelope function to the proximal function. Furthermore, the results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.
    StratMed: Relevance Stratification for Low-resource Medication Recommendation. (arXiv:2308.16781v1 [cs.AI])
    With the growing imbalance between limited medical resources and escalating demands, AI-based clinical tasks have become paramount. Medication recommendation, as a sub-domain, aims to amalgamate longitudinal patient history with medical knowledge, assisting physicians in prescribing safer and more accurate medication combinations. Existing methods overlook the inherent long-tail distribution in medical data, lacking balanced representation between head and tail data, which leads to sub-optimal model performance. To address this challenge, we introduce StratMed, a model that incorporates an innovative relevance stratification mechanism. It harmonizes discrepancies in data long-tail distribution and strikes a balance between the safety and accuracy of medication combinations. Specifically, we first construct a pre-training method using deep learning networks to obtain entity representation. After that, we design a pyramid-like data stratification method to obtain more generalized entity relationships by reinforcing the features of unpopular entities. Based on this relationship, we designed two graph structures to express medication precision and safety at the same level to obtain visit representations. Finally, the patient's historical clinical information is fitted to generate medication combinations for the current health condition. Experiments on the MIMIC-III dataset demonstrate that our method has outperformed current state-of-the-art methods in four evaluation metrics (including safety and accuracy).
    Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural Networks. (arXiv:2308.16800v1 [cs.LG])
    Our study reveals new theoretical insights into over-smoothing and feature over-correlation in deep graph neural networks. We show the prevalence of invariant subspaces, demonstrating a fixed relative behavior that is unaffected by feature transformations. Our work clarifies recent observations related to convergence to a constant state and a potential over-separation of node states, as the amplification of subspaces only depends on the spectrum of the aggregation function. In linear scenarios, this leads to node representations being dominated by a low-dimensional subspace with an asymptotic convergence rate independent of the feature transformations. This causes a rank collapse of the node representations, resulting in over-smoothing when smooth vectors span this subspace, and over-correlation even when over-smoothing is avoided. Guided by our theory, we propose a sum of Kronecker products as a beneficial property that can provably prevent over-smoothing, over-correlation, and rank collapse. We empirically extend our insights to the non-linear case, demonstrating the inability of existing models to capture linearly independent features.
    Language-Conditioned Path Planning. (arXiv:2308.16893v1 [cs.RO])
    Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where contact-awareness is incorporated into the path planning problem. As a first step in this domain, we propose Language-Conditioned Collision Functions (LACO) a novel approach that learns a collision function using only a single-view image, language prompt, and robot configuration. LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for manual object annotations, point cloud data, or ground-truth object meshes. In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision.
    Multi-Objective Decision Transformers for Offline Reinforcement Learning. (arXiv:2308.16379v1 [cs.LG])
    Offline Reinforcement Learning (RL) is structured to derive policies from static trajectory data without requiring real-time environment interactions. Recent studies have shown the feasibility of framing offline RL as a sequence modeling task, where the sole aim is to predict actions based on prior context using the transformer architecture. However, the limitation of this single task learning approach is its potential to undermine the transformer model's attention mechanism, which should ideally allocate varying attention weights across different tokens in the input context for optimal prediction. To address this, we reformulate offline RL as a multi-objective optimization problem, where the prediction is extended to states and returns. We also highlight a potential flaw in the trajectory representation used for sequence modeling, which could generate inaccuracies when modeling the state and return distributions. This is due to the non-smoothness of the action distribution within the trajectory dictated by the behavioral policy. To mitigate this issue, we introduce action space regions to the trajectory representation. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model, resulting in performance that either matches or outperforms current state-of-the art methods.
    Information Theoretically Optimal Sample Complexity of Learning Dynamical Directed Acyclic Graphs. (arXiv:2308.16859v1 [stat.ML])
    In this article, the optimal sample complexity of learning the underlying interaction/dependencies of a Linear Dynamical System (LDS) over a Directed Acyclic Graph (DAG) is studied. The sample complexity of learning a DAG's structure is well-studied for static systems, where the samples of nodal states are independent and identically distributed (i.i.d.). However, such a study is less explored for DAGs with dynamical systems, where the nodal states are temporally correlated. We call such a DAG underlying an LDS as \emph{dynamical} DAG (DDAG). In particular, we consider a DDAG where the nodal dynamics are driven by unobserved exogenous noise sources that are wide-sense stationary (WSS) in time but are mutually uncorrelated, and have the same {power spectral density (PSD)}. Inspired by the static settings, a metric and an algorithm based on the PSD matrix of the observed time series are proposed to reconstruct the DDAG. The equal noise PSD assumption can be relaxed such that identifiability conditions for DDAG reconstruction are not violated. For the LDS with WSS (sub) Gaussian exogenous noise sources, it is shown that the optimal sample complexity (or length of state trajectory) needed to learn the DDAG is $n=\Theta(q\log(p/q))$, where $p$ is the number of nodes and $q$ is the maximum number of parents per node. To prove the sample complexity upper bound, a concentration bound for the PSD estimation is derived, under two different sampling strategies. A matching min-max lower bound using generalized Fano's inequality also is provided, thus showing the order optimality of the proposed algorithm.
    Robust Networked Federated Learning for Localization. (arXiv:2308.16737v1 [cs.LG])
    This paper addresses the problem of localization, which is inherently non-convex and non-smooth in a federated setting where the data is distributed across a multitude of devices. Due to the decentralized nature of federated environments, distributed learning becomes essential for scalability and adaptability. Moreover, these environments are often plagued by outlier data, which presents substantial challenges to conventional methods, particularly in maintaining estimation accuracy and ensuring algorithm convergence. To mitigate these challenges, we propose a method that adopts an $L_1$-norm robust formulation within a distributed sub-gradient framework, explicitly designed to handle these obstacles. Our approach addresses the problem in its original form, without resorting to iterative simplifications or approximations, resulting in enhanced computational efficiency and improved estimation accuracy. We demonstrate that our method converges to a stationary point, highlighting its effectiveness and reliability. Through numerical simulations, we confirm the superior performance of our approach, notably in outlier-rich environments, which surpasses existing state-of-the-art localization methods.
    Echocardiographic View Classification with Integrated Out-of-Distribution Detection for Enhanced Automatic Echocardiographic Analysis. (arXiv:2308.16483v1 [eess.SP])
    In the rapidly evolving field of automatic echocardiographic analysis and interpretation, automatic view classification is a critical yet challenging task, owing to the inherent complexity and variability of echocardiographic data. This study presents ECHOcardiography VIew Classification with Out-of-Distribution dEtection (ECHO-VICODE), a novel deep learning-based framework that effectively addresses this challenge by training to classify 31 classes, surpassing previous studies and demonstrating its capacity to handle a wide range of echocardiographic views. Furthermore, ECHO-VICODE incorporates an integrated out-of-distribution (OOD) detection function, leveraging the relative Mahalanobis distance to effectively identify 'near-OOD' instances commonly encountered in echocardiographic data. Through extensive experimentation, we demonstrated the outstanding performance of ECHO-VICODE in terms of view classification and OOD detection, significantly reducing the potential for errors in echocardiographic analyses. This pioneering study significantly advances the domain of automated echocardiography analysis and exhibits promising prospects for substantial applications in extensive clinical research and practice.
    Forecasting Emergency Department Crowding with Advanced Machine Learning Models and Multivariable Input. (arXiv:2308.16544v1 [cs.LG])
    Emergency department (ED) crowding is a significant threat to patient safety and it has been repeatedly associated with increased mortality. Forecasting future service demand has the potential patient outcomes. Despite active research on the subject, several gaps remain: 1) proposed forecasting models have become outdated due to quick influx of advanced machine learning models (ML), 2) amount of multivariable input data has been limited and 3) discrete performance metrics have been rarely reported. In this study, we document the performance of a set of advanced ML models in forecasting ED occupancy 24 hours ahead. We use electronic health record data from a large, combined ED with an extensive set of explanatory variables, including the availability of beds in catchment area hospitals, traffic data from local observation stations, weather variables, etc. We show that N-BEATS and LightGBM outpeform benchmarks with 11 % and 9 % respective improvements and that DeepAR predicts next day crowding with an AUC of 0.76 (95 % CI 0.69-0.84). To the best of our knowledge, this is the first study to document the superiority of LightGBM and N-BEATS over statistical benchmarks in the context of ED forecasting.
    Improving Robustness and Accuracy of Ponzi Scheme Detection on Ethereum Using Time-Dependent Features. (arXiv:2308.16391v1 [cs.CR])
    The rapid development of blockchain has led to more and more funding pouring into the cryptocurrency market, which also attracted cybercriminals' interest in recent years. The Ponzi scheme, an old-fashioned fraud, is now popular on the blockchain, causing considerable financial losses to many crypto-investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code or opcode. The contract-code-based approach, while achieving very high accuracy, is not robust: first, the source codes of a majority of contracts on Ethereum are not available, and second, a Ponzi developer can fool a contract-code-based detection model by obfuscating the opcode or inventing a new profit distribution logic that cannot be detected (since these models were trained on existing Ponzi logics only). A transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated. However, the current transaction-based detection models achieve fairly low accuracy. We address this gap in the literature by developing new detection models that rely only on the transactions, hence guaranteeing the robustness, and moreover, achieve considerably higher Accuracy, Precision, Recall, and F1-score than existing transaction-based models. This is made possible thanks to the introduction of novel time-dependent features that capture Ponzi behaviours characteristics derived from our comprehensive data analyses on Ponzi and non-Ponzi data from the XBlock-ETH repository
    Everyone Can Attack: Repurpose Lossy Compression as a Natural Backdoor Attack. (arXiv:2308.16684v1 [cs.CR])
    The vulnerabilities to backdoor attacks have recently threatened the trustworthiness of machine learning models in practical applications. Conventional wisdom suggests that not everyone can be an attacker since the process of designing the trigger generation algorithm often involves significant effort and extensive experimentation to ensure the attack's stealthiness and effectiveness. Alternatively, this paper shows that there exists a more severe backdoor threat: anyone can exploit an easily-accessible algorithm for silent backdoor attacks. Specifically, this attacker can employ the widely-used lossy image compression from a plethora of compression tools to effortlessly inject a trigger pattern into an image without leaving any noticeable trace; i.e., the generated triggers are natural artifacts. One does not require extensive knowledge to click on the "convert" or "save as" button while using tools for lossy image compression. Via this attack, the adversary does not need to design a trigger generator as seen in prior works and only requires poisoning the data. Empirically, the proposed attack consistently achieves 100% attack success rate in several benchmark datasets such as MNIST, CIFAR-10, GTSRB and CelebA. More significantly, the proposed attack can still achieve almost 100% attack success rate with very small (approximately 10%) poisoning rates in the clean label setting. The generated trigger of the proposed attack using one lossy compression algorithm is also transferable across other related compression algorithms, exacerbating the severity of this backdoor threat. This work takes another crucial step toward understanding the extensive risks of backdoor attacks in practice, urging practitioners to investigate similar attacks and relevant backdoor mitigation methods.
    Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing. (arXiv:2308.16671v1 [cs.LG])
    Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging, as there is no central server to coordinate the training process. Especially when distributed nodes suffer from limitations in communication or computational resources, DFL will experience extremely inefficient and unstable training. Motivated by these challenges, in this paper, we develop a novel algorithm based on the framework of the inexact alternating direction method (iADM). On one hand, our goal is to train a shared model with a sparsity constraint. This constraint enables us to leverage one-bit compressive sensing (1BCS), allowing transmission of one-bit information among neighbour nodes. On the other hand, communication between neighbour nodes occurs only at certain steps, reducing the number of communication rounds. Therefore, the algorithm exhibits notable communication efficiency. Additionally, as each node selects only a subset of neighbours to participate in the training, the algorithm is robust against stragglers. Additionally, complex items are computed only once for several consecutive steps and subproblems are solved inexactly using closed-form solutions, resulting in high computational efficiency. Finally, numerical experiments showcase the algorithm's effectiveness in both communication and computation.
    Majorization-Minimization for sparse SVMs. (arXiv:2308.16858v1 [cs.LG])
    Several decades ago, Support Vector Machines (SVMs) were introduced for performing binary classification tasks, under a supervised framework. Nowadays, they often outperform other supervised methods and remain one of the most popular approaches in the machine learning arena. In this work, we investigate the training of SVMs through a smooth sparse-promoting-regularized squared hinge loss minimization. This choice paves the way to the application of quick training methods built on majorization-minimization approaches, benefiting from the Lipschitz differentiabililty of the loss function. Moreover, the proposed approach allows us to handle sparsity-preserving regularizers promoting the selection of the most significant features, so enhancing the performance. Numerical tests and comparisons conducted on three different datasets demonstrate the good performance of the proposed methodology in terms of qualitative metrics (accuracy, precision, recall, and F 1 score) as well as computational cost.
    Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps. (arXiv:2308.16648v1 [cs.CV])
    Despite recent advancements in image generation, diffusion models still remain largely underexplored in Earth Observation. In this paper we show that state-of-the-art pretrained diffusion models can be conditioned on cartographic data to generate realistic satellite images. We provide two large datasets of paired OpenStreetMap images and satellite views over the region of Mainland Scotland and the Central Belt. We train a ControlNet model and qualitatively evaluate the results, demonstrating that both image quality and map fidelity are possible. Finally, we provide some insights on the opportunities and challenges of applying these models for remote sensing. Our model weights and code for creating the dataset are publicly available at https://github.com/miquel-espinosa/map-sat.
    Calibrated Explanations for Regression. (arXiv:2308.16245v1 [cs.LG])
    Artificial Intelligence (AI) is often an integral part of modern decision support systems (DSSs). The best-performing predictive models used in AI-based DSSs lack transparency. Explainable Artificial Intelligence (XAI) aims to create AI systems that can explain their rationale to human users. Local explanations in XAI can provide information about the causes of individual predictions in terms of feature importance. However, a critical drawback of existing local explanation methods is their inability to quantify the uncertainty associated with a feature's importance. This paper introduces an extension of a feature importance explanation method, Calibrated Explanations (CE), previously only supporting classification, with support for standard regression and probabilistic regression, i.e., the probability that the target is above an arbitrary threshold. The extension for regression keeps all the benefits of CE, such as calibration of the prediction from the underlying model with confidence intervals, uncertainty quantification of feature importance, and allows both factual and counterfactual explanations. CE for standard regression provides fast, reliable, stable, and robust explanations. CE for probabilistic regression provides an entirely new way of creating probabilistic explanations from any ordinary regression model and with a dynamic selection of thresholds. The performance of CE for probabilistic regression regarding stability and speed is comparable to LIME. The method is model agnostic with easily understood conditional rules. An implementation in Python is freely available on GitHub and for installation using pip making the results in this paper easily replicable.
    Training Neural Networks Using Reproducing Kernel Space Interpolation and Model Reduction. (arXiv:2308.16754v1 [math.FA])
    We introduce and study the theory of training neural networks using interpolation techniques from reproducing kernel Hilbert space theory. We generalize the method to Krein spaces, and show that widely-used neural network architectures are subsets of reproducing kernel Krein spaces (RKKS). We study the concept of "associated Hilbert spaces" of RKKS and develop techniques to improve upon the expressivity of various activation functions. Next, using concepts from the theory of functions of several complex variables, we prove a computationally applicable, multidimensional generalization of the celebrated Adamjan- Arov-Krein (AAK) theorem. The theorem yields a novel class of neural networks, called Prolongation Neural Networks (PNN). We demonstrate that, by applying the multidimensional AAK theorem to gain a PNN, one can gain performance superior to both our interpolatory methods and current state-of-the-art methods in noisy environments. We provide useful illustrations of our methods in practice.
    Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study. (arXiv:2308.16585v1 [cs.LG])
    Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve g…
    Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation. (arXiv:2308.16598v1 [eess.IV])
    Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential role in the early diagnosis and treatment of liver cancer. Deep learning models backboned by fully convolutional neural networks (FCNNs) have become the dominant model for segmenting 3D computerized tomography (CT) scans. However, since their convolution layers suffer from limited kernel size, they are not able to capture long-range dependencies and global context. To tackle this restriction, vision transformers have been introduced to solve FCNN's locality of receptive fields. Although transformers can capture long-range features, their segmentation performance decreases with various tumor sizes due to the model sensitivity to the input patch size. While finding an optimal patch size improves the performance of vision transformer-based models on segmentation tasks, it is a time-consuming and challenging procedure. This paper proposes a technique to select the vision transformer's optimal input multi-resolution image patch size based on the average volume size of metastasis lesions. We further validated our suggested framework using a transfer-learning technique, demonstrating that the highest Dice similarity coefficient (DSC) performance was obtained by pre-training on training data with a larger tumour volume using the suggested ideal patch size and then training with a smaller one. We experimentally evaluate this idea through pre-training our model on a multi-resolution public dataset. Our model showed consistent and improved results when applied to our private multi-resolution mCRC dataset with a smaller average tumor volume. This study lays the groundwork for optimizing semantic segmentation of small objects using vision transformers. The implementation source code is available at:https://github.com/Ramtin-Mojtahedi/OVTPS.
    MONDEO: Multistage Botnet Detection. (arXiv:2308.16570v1 [cs.CR])
    Mobile devices have widespread to become the most used piece of technology. Due to their characteristics, they have become major targets for botnet-related malware. FluBot is one example of botnet malware that infects mobile devices. In particular, FluBot is a DNS-based botnet that uses Domain Generation Algorithms (DGA) to establish communication with the Command and Control Server (C2). MONDEO is a multistage mechanism with a flexible design to detect DNS-based botnet malware. MONDEO is lightweight and can be deployed without requiring the deployment of software, agents, or configuration in mobile devices, allowing easy integration in core networks. MONDEO comprises four detection stages: Blacklisting/Whitelisting, Query rate analysis, DGA analysis, and Machine learning evaluation. It was created with the goal of processing streams of packets to identify attacks with high efficiency, in the distinct phases. MONDEO was tested against several datasets to measure its efficiency and performance, being able to achieve high performance with RandomForest classifiers. The implementation is available at github.
    Latent Painter. (arXiv:2308.16490v1 [cs.CV])
    Latent diffusers revolutionized the generative AI and inspired creative art. When denoising the latent, the predicted original image at each step collectively animates the formation. However, the animation is limited by the denoising nature of the diffuser, and only renders a sharpening process. This work presents Latent Painter, which uses the latent as the canvas, and the diffuser predictions as the plan, to generate painting animation. Latent Painter also transits one generated image to another, which can happen between images from two different sets of checkpoints.
    Constructing Indoor Region-based Radio Map without Location Labels. (arXiv:2308.16759v1 [cs.LG])
    Radio map construction requires a large amount of radio measurement data with location labels, which imposes a high deployment cost. This paper develops a region-based radio map from received signal strength (RSS) measurements without location labels. The construction is based on a set of blindly collected RSS measurement data from a device that visits each region in an indoor area exactly once, where the footprints and timestamps are not recorded. The main challenge is to cluster the RSS data and match clusters with the physical regions. Classical clustering algorithms fail to work as the RSS data naturally appears as non-clustered due to multipaths and noise. In this paper, a signal subspace model with a sequential prior is constructed for the RSS data, and an integrated segmentation and clustering algorithm is developed, which is shown to find the globally optimal solution in a special case. Furthermore, the clustered data is matched with the physical regions using a graph-based approach. Based on real measurements from an office space, the proposed scheme reduces the region localization error by roughly 50% compared to a weighted centroid localization (WCL) baseline, and it even outperforms some supervised localization schemes, including k-nearest neighbor (KNN), support vector machine (SVM), and deep neural network (DNN), which require labeled data for training.
    Scalable Incomplete Multi-View Clustering with Structure Alignment. (arXiv:2308.16541v1 [cs.LG])
    The success of existing multi-view clustering (MVC) relies on the assumption that all views are complete. However, samples are usually partially available due to data corruption or sensor malfunction, which raises the research of incomplete multi-view clustering (IMVC). Although several anchor-based IMVC methods have been proposed to process the large-scale incomplete data, they still suffer from the following drawbacks: i) Most existing approaches neglect the inter-view discrepancy and enforce cross-view representation to be consistent, which would corrupt the representation capability of the model; ii) Due to the samples disparity between different views, the learned anchor might be misaligned, which we referred as the Anchor-Unaligned Problem for Incomplete data (AUP-ID). Such the AUP-ID would cause inaccurate graph fusion and degrades clustering performance. To tackle these issues, we propose a novel incomplete anchor graph learning framework termed Scalable Incomplete Multi-View Clustering with Structure Alignment (SIMVC-SA). Specially, we construct the view-specific anchor graph to capture the complementary information from different views. In order to solve the AUP-ID, we propose a novel structure alignment module to refine the cross-view anchor correspondence. Meanwhile, the anchor graph construction and alignment are jointly optimized in our unified framework to enhance clustering quality. Through anchor graph construction instead of full graphs, the time and space complexity of the proposed SIMVC-SA is proven to be linearly correlated with the number of samples. Extensive experiments on seven incomplete benchmark datasets demonstrate the effectiveness and efficiency of our proposed method. Our code is publicly available at https://github.com/wy1019/SIMVC-SA.
    CL-MAE: Curriculum-Learned Masked Autoencoders. (arXiv:2308.16572v1 [cs.CV])
    Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches (tokens) in input images, with the masking strategy remaining unchanged during training. In this paper, we propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task. We conjecture that, by gradually increasing the task complexity, the model can learn more sophisticated and transferable representations. To facilitate this, we introduce a novel learnable masking module that possesses the capability to generate masks of different complexities, and integrate the proposed module into masked autoencoders (MAE). Our module is jointly trained with the MAE, while adjusting its behavior during training, transitioning from a partner to the MAE (optimizing the same reconstruction loss) to an adversary (optimizing the opposite loss), while passing through a neutral state. The transition between these behaviors is smooth, being regulated by a factor that is multiplied with the reconstruction loss of the masking module. The resulting training procedure generates an easy-to-hard curriculum. We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE. The empirical results on five downstream tasks confirm our conjecture, demonstrating that curriculum learning can be successfully used to self-supervise masked autoencoders.
    What can we learn from quantum convolutional neural networks?. (arXiv:2308.16664v1 [quant-ph])
    We can learn from analyzing quantum convolutional neural networks (QCNNs) that: 1) working with quantum data can be perceived as embedding physical system parameters through a hidden feature map; 2) their high performance for quantum phase recognition can be attributed to generation of a very suitable basis set during the ground state embedding, where quantum criticality of spin models leads to basis functions with rapidly changing features; 3) pooling layers of QCNNs are responsible for picking those basis functions that can contribute to forming a high-performing decision boundary, and the learning process corresponds to adapting the measurement such that few-qubit operators are mapped to full-register observables; 4) generalization of QCNN models strongly depends on the embedding type, and that rotation-based feature maps with the Fourier basis require careful feature engineering; 5) accuracy and generalization of QCNNs with readout based on a limited number of shots favor the ground state embeddings and associated physics-informed models. We demonstrate these points in simulation, where our results shed light on classification for physical processes, relevant for applications in sensing. Finally, we show that QCNNs with properly chosen ground state embeddings can be used for fluid dynamics problems, expressing shock wave solutions with good generalization and proven trainability.
    Document Layout Analysis on BaDLAD Dataset: A Comprehensive MViTv2 Based Approach. (arXiv:2308.16571v1 [cs.CV])
    In the rapidly evolving digital era, the analysis of document layouts plays a pivotal role in automated information extraction and interpretation. In our work, we have trained MViTv2 transformer model architecture with cascaded mask R-CNN on BaDLAD dataset to extract text box, paragraphs, images and tables from a document. After training on 20365 document images for 36 epochs in a 3 phase cycle, we achieved a training loss of 0.2125 and a mask loss of 0.19. Our work extends beyond training, delving into the exploration of potential enhancement avenues. We investigate the impact of rotation and flip augmentation, the effectiveness of slicing input images pre-inference, the implications of varying the resolution of the transformer backbone, and the potential of employing a dual-pass inference to uncover missed text-boxes. Through these explorations, we observe a spectrum of outcomes, where some modifications result in tangible performance improvements, while others offer unique insights for future endeavors.
    BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge. (arXiv:2308.16458v1 [cs.LG])
    Pre-trained language models like ChatGPT have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks. Moreover, in bioinformatics, generating functional programs poses additional notable challenges due to the amount of domain knowledge, the need for complicated data operations, and intricate functional dependencies between the operations. Here, we present BioCoder, a benchmark developed to evaluate existing pre-trained models in generating bioinformatics code. In relation to function-code generation, BioCoder covers potential package dependencies, class declarations, and global variables. It incorporates 1026 functions and 1243 methods in Python and Java from GitHub and 253 examples from the Rosalind Project. BioCoder incorporates a fuzz-testing framework for evaluation, and we have applied it to evaluate many models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, and ChatGPT. Our detailed analysis of these models emphasizes the importance of domain knowledge, pragmatic code generation, and contextual understanding. Our dataset, benchmark, Docker images, and scripts required for testing are all available at https://github.com/gersteinlab/biocoder.
    Towards Long-Tailed Recognition for Graph Classification via Collaborative Experts. (arXiv:2308.16609v1 [cs.LG])
    Graph classification, aiming at learning the graph-level representations for effective class assignments, has received outstanding achievements, which heavily relies on high-quality datasets that have balanced class distribution. In fact, most real-world graph data naturally presents a long-tailed form, where the head classes occupy much more samples than the tail classes, it thus is essential to study the graph-level classification over long-tailed data while still remaining largely unexplored. However, most existing long-tailed learning methods in visions fail to jointly optimize the representation learning and classifier training, as well as neglect the mining of the hard-to-classify classes. Directly applying existing methods to graphs may lead to sub-optimal performance, since the model trained on graphs would be more sensitive to the long-tailed distribution due to the complex topological characteristics. Hence, in this paper, we propose a novel long-tailed graph-level classification framework via Collaborative Multi-expert Learning (CoMe) to tackle the problem. To equilibrate the contributions of head and tail classes, we first develop balanced contrastive learning from the view of representation learning, and then design an individual-expert classifier training based on hard class mining. In addition, we execute gated fusion and disentangled knowledge distillation among the multiple experts to promote the collaboration in a multi-expert framework. Comprehensive experiments are performed on seven widely-used benchmark datasets to demonstrate the superiority of our method CoMe over state-of-the-art baselines.
    Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis. (arXiv:2308.16593v1 [cs.SD])
    The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style. However, synthesizing spontaneous-style speech is challenging due to the lack of high-quality spontaneous datasets and the high cost of labeling spontaneous behavior. In this paper, we propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech and spontaneous behavioral labels. In the process of semi-supervised learning, both text and speech information are considered for detecting spontaneous behaviors labels in speech. Moreover, a linguistic-aware encoder is used to model the relationship between each sentence in the conversation. Experimental results indicate that our proposed method achieves superior expressive speech synthesis performance with the ability to model spontaneous behavior in spontaneous-style speech and predict reasonable spontaneous behavior from text.
    SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects. (arXiv:2308.16528v1 [cs.CV])
    To enable meaningful robotic manipulation of objects in the real-world, 6D pose estimation is one of the critical aspects. Most existing approaches have difficulties to extend predictions to scenarios where novel object instances are continuously introduced, especially with heavy occlusions. In this work, we propose a few-shot pose estimation (FSPE) approach called SA6D, which uses a self-adaptive segmentation module to identify the novel target object and construct a point cloud model of the target object using only a small number of cluttered reference images. Unlike existing methods, SA6D does not require object-centric reference images or any additional object information, making it a more generalizable and scalable solution across categories. We evaluate SA6D on real-world tabletop object datasets and demonstrate that SA6D outperforms existing FSPE methods, particularly in cluttered scenes with occlusions, while requiring fewer reference images.
    Transformer-based interpretable multi-modal data fusion for skin lesion classification. (arXiv:2304.14505v2 [eess.IV] UPDATED)
    A lot of deep learning (DL) research these days is mainly focused on improving quantitative metrics regardless of other factors. In human-centered applications, like skin lesion classification in dermatology, DL-driven clinical decision support systems are still in their infancy due to the limited transparency of their decision-making process. Moreover, the lack of procedures that can explain the behavior of trained DL algorithms leads to almost no trust from clinical physicians. To diagnose skin lesions, dermatologists rely on visual assessment of the disease and the data gathered from the patient's anamnesis. Data-driven algorithms dealing with multi-modal data are limited by the separation of feature-level and decision-level fusion procedures required by convolutional architectures. To address this issue, we enable single-stage multi-modal data fusion via the attention mechanism of transformer-based architectures to aid in diagnosing skin diseases. Our method beats other state-of-the-art single- and multi-modal DL architectures in image-rich and patient-data-rich environments. Additionally, the choice of the architecture enables native interpretability support for the classification task both in the image and metadata domain with no additional modifications necessary.
    On a Connection between Differential Games, Optimal Control, and Energy-based Models for Multi-Agent Interactions. (arXiv:2308.16539v1 [cs.RO])
    Game theory offers an interpretable mathematical framework for modeling multi-agent interactions. However, its applicability in real-world robotics applications is hindered by several challenges, such as unknown agents' preferences and goals. To address these challenges, we show a connection between differential games, optimal control, and energy-based models and demonstrate how existing approaches can be unified under our proposed Energy-based Potential Game formulation. Building upon this formulation, this work introduces a new end-to-end learning application that combines neural networks for game-parameter inference with a differentiable game-theoretic optimization layer, acting as an inductive bias. The experiments using simulated mobile robot pedestrian interactions and real-world automated driving data provide empirical evidence that the game-theoretic layer improves the predictive performance of various neural network backbones.
    Domain-adaptive Message Passing Graph Neural Network. (arXiv:2308.16470v1 [cs.LG])
    Cross-network node classification (CNNC), which aims to classify nodes in a label-deficient target network by transferring the knowledge from a source network with abundant labels, draws increasing attention recently. To address CNNC, we propose a domain-adaptive message passing graph neural network (DM-GNN), which integrates graph neural network (GNN) with conditional adversarial domain adaptation. DM-GNN is capable of learning informative representations for node classification that are also transferrable across networks. Firstly, a GNN encoder is constructed by dual feature extractors to separate ego-embedding learning from neighbor-embedding learning so as to jointly capture commonality and discrimination between connected nodes. Secondly, a label propagation node classifier is proposed to refine each node's label prediction by combining its own prediction and its neighbors' prediction. In addition, a label-aware propagation scheme is devised for the labeled source network to promote intra-class propagation while avoiding inter-class propagation, thus yielding label-discriminative source embeddings. Thirdly, conditional adversarial domain adaptation is performed to take the neighborhood-refined class-label information into account during adversarial domain adaptation, so that the class-conditional distributions across networks can be better matched. Comparisons with eleven state-of-the-art methods demonstrate the effectiveness of the proposed DM-GNN.
    Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning. (arXiv:2308.16481v1 [cs.CV])
    We present Point-TTA, a novel test-time adaptation framework for point cloud registration (PCR) that improves the generalization and the performance of registration models. While learning-based approaches have achieved impressive progress, generalization to unknown testing environments remains a major challenge due to the variations in 3D scans. Existing methods typically train a generic model and the same trained model is applied on each instance during testing. This could be sub-optimal since it is difficult for the same model to handle all the variations during testing. In this paper, we propose a test-time adaptation approach for PCR. Our model can adapt to unseen distributions at test-time without requiring any prior knowledge of the test data. Concretely, we design three self-supervised auxiliary tasks that are optimized jointly with the primary PCR task. Given a test instance, we adapt our model using these auxiliary tasks and the updated model is used to perform the inference. During training, our model is trained using a meta-auxiliary learning approach, such that the adapted model via auxiliary tasks improves the accuracy of the primary task. Experimental results demonstrate the effectiveness of our approach in improving generalization of point cloud registration and outperforming other state-of-the-art approaches.
    Test-Time Adaptation for Point Cloud Upsampling Using Meta-Learning. (arXiv:2308.16484v1 [cs.CV])
    Affordable 3D scanners often produce sparse and non-uniform point clouds that negatively impact downstream applications in robotic systems. While existing point cloud upsampling architectures have demonstrated promising results on standard benchmarks, they tend to experience significant performance drops when the test data have different distributions from the training data. To address this issue, this paper proposes a test-time adaption approach to enhance model generality of point cloud upsampling. The proposed approach leverages meta-learning to explicitly learn network parameters for test-time adaption. Our method does not require any prior information about the test data. During meta-training, the model parameters are learned from a collection of instance-level tasks, each of which consists of a sparse-dense pair of point clouds from the training data. During meta-testing, the trained model is fine-tuned with a few gradient updates to produce a unique set of network parameters for each test instance. The updated model is then used for the final prediction. Our framework is generic and can be applied in a plug-and-play manner with existing backbone networks in point cloud upsampling. Extensive experiments demonstrate that our approach improves the performance of state-of-the-art models.
    BenchTemp: A General Benchmark for Evaluating Temporal Graph Neural Networks. (arXiv:2308.16385v1 [cs.LG])
    To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed. Despite the success of these TGNNs, the previous TGNN evaluations reveal several limitations regarding four critical issues: 1) inconsistent datasets, 2) inconsistent evaluation pipelines, 3) lacking workload diversity, and 4) lacking efficient comparison. Overall, there lacks an empirical study that puts TGNN models onto the same ground and compares them comprehensively. To this end, we propose BenchTemp, a general benchmark for evaluating TGNN models on various workloads. BenchTemp provides a set of benchmark datasets so that different TGNN models can be fairly compared. Further, BenchTemp engineers a standard pipeline that unifies the TGNN evaluation. With BenchTemp, we extensively compare the representative TGNN models on different tasks (e.g., link prediction and node classification) and settings (transductive and inductive), w.r.t. both effectiveness and efficiency metrics. We have made BenchTemp publicly available at https://github.com/qianghuangwhu/benchtemp.
    In-class Data Analysis Replications: Teaching Students while Testing Science. (arXiv:2308.16491v1 [cs.CY])
    Science is facing a reproducibility crisis. Previous work has proposed incorporating data analysis replications into classrooms as a potential solution. However, despite the potential benefits, it is unclear whether this approach is feasible, and if so, what the involved stakeholders-students, educators, and scientists-should expect from it. Can students perform a data analysis replication over the course of a class? What are the costs and benefits for educators? And how can this solution help benchmark and improve the state of science? In the present study, we incorporated data analysis replications in the project component of the Applied Data Analysis course (CS-401) taught at EPFL (N=354 students). Here we report pre-registered findings based on surveys administered throughout the course. First, we demonstrate that students can replicate previously published scientific papers, most of them qualitatively and some exactly. We find discrepancies between what students expect of data analysis replications and what they experience by doing them along with changes in expectations about reproducibility, which together serve as evidence of attitude shifts to foster students' critical thinking. Second, we provide information for educators about how much overhead is needed to incorporate replications into the classroom and identify concerns that replications bring as compared to more traditional assignments. Third, we identify tangible benefits of the in-class data analysis replications for scientific communities, such as a collection of replication reports and insights about replication barriers in scientific work that should be avoided going forward. Overall, we demonstrate that incorporating replication tasks into a large data science class can increase the reproducibility of scientific work as a by-product of data science instruction, thus benefiting both science and students.
    Backpropagation through Back Substitution with a Backslash. (arXiv:2303.15449v2 [math.NA] UPDATED)
    We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally, the matrix elements are operators. This paper has three contributions: (i) it is of intellectual value to replace traditional treatments of automatic differentiation with a (left acting) operator theoretic, graph-based approach; (ii) operators can be readily placed in matrices in software in programming languages such as Julia as an implementation option; (iii) we introduce a novel notation, ``transpose dot'' operator ``$\{\}^{T_\bullet}$'' that allows for the reversal of operators. We further demonstrate the elegance of the operators approach in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, and that it is possible to realize this abstraction in code. Our implementation shows how generic linear algebra can allow operators as elements of matrices. In contrast to ``operator overloading,'' where backslash would normally have to be rewritten to take advantage of operators, with ``generic programming'' there is no such need.
    Knowledge Enhanced Graph Neural Networks for Graph Completion. (arXiv:2303.15487v3 [cs.AI] UPDATED)
    Graph data is omnipresent and has a wide variety of applications, such as in natural science, social networks, or the semantic web. However, while being rich in information, graphs are often noisy and incomplete. As a result, graph completion tasks, such as node classification or link prediction, have gained attention. On one hand, neural methods, such as graph neural networks, have proven to be robust tools for learning rich representations of noisy graphs. On the other hand, symbolic methods enable exact reasoning on graphs.We propose Knowledge Enhanced Graph Neural Networks (KeGNN), a neuro-symbolic framework for graph completion that combines both paradigms as it allows for the integration of prior knowledge into a graph neural network model.Essentially, KeGNN consists of a graph neural network as a base upon which knowledge enhancement layers are stacked with the goal of refining predictions with respect to prior knowledge.We instantiate KeGNN in conjunction with two state-of-the-art graph neural networks, Graph Convolutional Networks and Graph Attention Networks, and evaluate KeGNN on multiple benchmark datasets for node classification.
    Invertible normalizing flow neural networks by JKO scheme. (arXiv:2212.14424v2 [stat.ML] UPDATED)
    Normalizing flow is a class of deep generative models for efficient sampling and density estimation. In practice, the flow often appears as a chain of invertible neural network blocks; to facilitate training, existing works have regularized flow trajectories and designed special network architectures. The current paper develops a neural ODE flow network inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which allows efficient block-wise training of the residual blocks without sampling SDE trajectories or inner loops of score matching or variational learning. As the JKO scheme unfolds the dynamic of gradient flow, the proposed model naturally stacks residual network blocks one by one, reducing the memory load and difficulty in performing end-to-end deep flow network training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the trajectory in probability space, which improves the model training efficiency and accuracy in practice. Using numerical experiments with synthetic and real data, we show that the proposed JKO-iFlow model achieves similar or better performance in generating new samples compared with the existing flow and diffusion models at a significantly reduced computational and memory cost.
    Neuronal diversity can improve machine learning for physics and beyond. (arXiv:2204.04348v3 [cs.LG] UPDATED)
    Diversity conveys advantages in nature, yet homogeneous neurons typically comprise the layers of artificial neural networks. Here we construct neural networks from neurons that learn their own activation functions, quickly diversify, and subsequently outperform their homogeneous counterparts on image classification and nonlinear regression tasks. Sub-networks instantiate the neurons, which meta-learn especially efficient sets of nonlinear responses. Examples include conventional neural networks classifying digits and forecasting a van der Pol oscillator and physics-informed Hamiltonian neural networks learning H\'enon-Heiles stellar orbits and the swing of a video recorded pendulum clock. Such \textit{learned diversity} provides examples of dynamical systems selecting diversity over uniformity and elucidates the role of diversity in natural and artificial systems.
    System identification of neural systems: If we got it right, would we know?. (arXiv:2302.06677v2 [q-bio.NC] UPDATED)
    Artificial neural networks are being proposed as models of parts of the brain. The networks are compared to recordings of biological neurons, and good performance in reproducing neural responses is considered to support the model's validity. A key question is how much this system identification approach tells us about brain computation. Does it validate one model architecture over another? We evaluate the most commonly used comparison techniques, such as a linear encoding model and centered kernel alignment, to correctly identify a model by replacing brain recordings with known ground truth models. System identification performance is quite variable; it also depends significantly on factors independent of the ground truth architecture, such as stimuli images. In addition, we show the limitations of using functional similarity scores in identifying higher-level architectural motifs.
    Balancing between the Local and Global Structures (LGS) in Graph Embedding. (arXiv:2308.16403v1 [cs.HC])
    We present a method for balancing between the Local and Global Structures (LGS) in graph embedding, via a tunable parameter. Some embedding methods aim to capture global structures, while others attempt to preserve local neighborhoods. Few methods attempt to do both, and it is not always possible to capture well both local and global information in two dimensions, which is where most graph drawing live. The choice of using a local or a global embedding for visualization depends not only on the task but also on the structure of the underlying data, which may not be known in advance. For a given graph, LGS aims to find a good balance between the local and global structure to preserve. We evaluate the performance of LGS with synthetic and real-world datasets and our results indicate that it is competitive with the state-of-the-art methods, using established quality metrics such as stress and neighborhood preservation. We introduce a novel quality metric, cluster distance preservation, to assess intermediate structure capture. All source-code, datasets, experiments and analysis are available online.
    Listen to Minority: Encrypted Traffic Classification for Class Imbalance with Contrastive Pre-Training. (arXiv:2308.16453v1 [cs.CR])
    Mobile Internet has profoundly reshaped modern lifestyles in various aspects. Encrypted Traffic Classification (ETC) naturally plays a crucial role in managing mobile Internet, especially with the explosive growth of mobile apps using encrypted communication. Despite some existing learning-based ETC methods showing promising results, three-fold limitations still remain in real-world network environments, 1) label bias caused by traffic class imbalance, 2) traffic homogeneity caused by component sharing, and 3) training with reliance on sufficient labeled traffic. None of the existing ETC methods can address all these limitations. In this paper, we propose a novel Pre-trAining Semi-Supervised ETC framework, dubbed PASS. Our key insight is to resample the original train dataset and perform contrastive pre-training without using individual app labels directly to avoid label bias issues caused by class imbalance, while obtaining a robust feature representation to differentiate overlapping homogeneous traffic by pulling positive traffic pairs closer and pushing negative pairs away. Meanwhile, PASS designs a semi-supervised optimization strategy based on pseudo-label iteration and dynamic loss weighting algorithms in order to effectively utilize massive unlabeled traffic data and alleviate manual train dataset annotation workload. PASS outperforms state-of-the-art ETC methods and generic sampling approaches on four public datasets with significant class imbalance and traffic homogeneity, remarkably pushing the F1 of Cross-Platform215 with 1.31%, ISCX-17 with 9.12%. Furthermore, we validate the generality of the contrastive pre-training and pseudo-label iteration components of PASS, which can adaptively benefit ETC methods with diverse feature extractors.
    Least Squares Maximum and Weighted Generalization-Memorization Machines. (arXiv:2308.16456v1 [stat.ML])
    In this paper, we propose a new way of remembering by introducing a memory influence mechanism for the least squares support vector machine (LSSVM). Without changing the equation constraints of the original LSSVM, this mechanism, allows an accurate partitioning of the training set without overfitting. The maximum memory impact model (MIMM) and the weighted impact memory model (WIMM) are then proposed. It is demonstrated that these models can be degraded to the LSSVM. Furthermore, we propose some different memory impact functions for the MIMM and WIMM. The experimental results show that that our MIMM and WIMM have better generalization performance compared to the LSSVM and significant advantage in time cost compared to other memory models.
    Computing excited states of molecules using normalizing flows. (arXiv:2308.16468v1 [physics.chem-ph])
    We present a new nonlinear variational framework for simultaneously computing ground and excited states of quantum systems. Our approach is based on approximating wavefunctions in the linear span of basis functions that are augmented and optimized \emph{via} composition with normalizing flows. The accuracy and efficiency of our approach are demonstrated in the calculations of a large number of vibrational states of the triatomic H$_2$S molecule as well as ground and several excited electronic states of prototypical one-electron systems including the hydrogen atom, the molecular hydrogen ion, and a carbon atom in a single-active-electron approximation. The results demonstrate significant improvements in the accuracy of energy predictions and accelerated basis-set convergence even when using normalizing flows with a small number of parameters. The present approach can be also seen as the optimization of a set of intrinsic coordinates that best capture the underlying physics within the given basis set.
    Materials Informatics Transformer: A Language Model for Interpretable Materials Properties Prediction. (arXiv:2308.16259v1 [cs.LG])
    Recently, the remarkable capabilities of large language models (LLMs) have been illustrated across a variety of research domains such as natural language processing, computer vision, and molecular modeling. We extend this paradigm by utilizing LLMs for material property prediction by introducing our model Materials Informatics Transformer (MatInFormer). Specifically, we introduce a novel approach that involves learning the grammar of crystallography through the tokenization of pertinent space group information. We further illustrate the adaptability of MatInFormer by incorporating task-specific data pertaining to Metal-Organic Frameworks (MOFs). Through attention visualization, we uncover the key features that the model prioritizes during property prediction. The effectiveness of our proposed model is empirically validated across 14 distinct datasets, hereby underscoring its potential for high throughput screening through accurate material property prediction.
    On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint. (arXiv:2308.16425v1 [cs.LG])
    Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions.
    ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language Understanding. (arXiv:2308.16336v1 [cs.CL])
    We present ToddlerBERTa, a BabyBERTa-like language model, exploring its capabilities through five different models with varied hyperparameters. Evaluating on BLiMP, SuperGLUE, MSGS, and a Supplement benchmark from the BabyLM challenge, we find that smaller models can excel in specific tasks, while larger models perform well with substantial data. Despite training on a smaller dataset, ToddlerBERTa demonstrates commendable performance, rivalling the state-of-the-art RoBERTa-base. The model showcases robust language understanding, even with single-sentence pretraining, and competes with baselines that leverage broader contextual information. Our work provides insights into hyperparameter choices, and data utilization, contributing to the advancement of language models.
    A Unified Analysis for the Subgradient Methods Minimizing Composite Nonconvex, Nonsmooth and Non-Lipschitz Functions. (arXiv:2308.16362v1 [math.OC])
    In this paper we propose a proximal subgradient method (Prox-SubGrad) for solving nonconvex and nonsmooth optimization problems without assuming Lipschitz continuity conditions. A number of subgradient upper bounds and their relationships are presented. By means of these upper bounding conditions, we establish some uniform recursive relations for the Moreau envelopes for weakly convex optimization. This uniform scheme simplifies and unifies the proof schemes to establish rate of convergence for Prox-SubGrad without assuming Lipschitz continuity. We present a novel convergence analysis in this context. Furthermore, we propose some new stochastic subgradient upper bounding conditions and establish convergence and iteration complexity rates for the stochastic subgradient method (Sto-SubGrad) to solve non-Lipschitz and nonsmooth stochastic optimization problems. In particular, for both deterministic and stochastic subgradient methods on weakly convex optimization problems without Lipschitz continuity, under any of the subgradient upper bounding conditions to be introduced in the paper, we show that $O(1/\sqrt{T})$ convergence rate holds in terms of the square of gradient of the Moreau envelope function, which further improves to be $O(1/{T})$ if, in addition, the uniform KL condition with exponent $1/2$ holds.
    Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff. (arXiv:2308.16454v1 [cs.CV])
    This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetuning (AFT), (ii) representation-guided knowledge distillation (RGKD), and (iii) noisy replay (NR). AFT trains a DNN on adversarial examples by initializing its parameters with a DNN that is standardly pretrained on clean examples. RGKD and NR respectively entail a regularization term and an algorithm to preserve latent representations of clean examples during AFT. RGKD penalizes the distance between the representations of the standardly pretrained and AFT DNNs. NR switches input adversarial examples to nonadversarial ones when the representation changes significantly during AFT. By combining these components, ARREST achieves both high standard accuracy and robustness. Experimental results demonstrate that ARREST mitigates the tradeoff more effectively than previous AT-based methods do.
    Ten Years of Generative Adversarial Nets (GANs): A survey of the state-of-the-art. (arXiv:2308.16316v1 [cs.LG])
    Since their inception in 2014, Generative Adversarial Networks (GANs) have rapidly emerged as powerful tools for generating realistic and diverse data across various domains, including computer vision and other applied areas. Consisting of a discriminative network and a generative network engaged in a Minimax game, GANs have revolutionized the field of generative modeling. In February 2018, GAN secured the leading spot on the ``Top Ten Global Breakthrough Technologies List'' issued by the Massachusetts Science and Technology Review. Over the years, numerous advancements have been proposed, leading to a rich array of GAN variants, such as conditional GAN, Wasserstein GAN, CycleGAN, and StyleGAN, among many others. This survey aims to provide a general overview of GANs, summarizing the latent architecture, validation metrics, and application areas of the most widely recognized variants. We also delve into recent theoretical developments, exploring the profound connection between the adversarial principle underlying GAN and Jensen-Shannon divergence, while discussing the optimality characteristics of the GAN framework. The efficiency of GAN variants and their model architectures will be evaluated along with training obstacles as well as training solutions. In addition, a detailed discussion will be provided, examining the integration of GANs with newly developed deep learning frameworks such as Transformers, Physics-Informed Neural Networks, Large Language models, and Diffusion models. Finally, we reveal several issues as well as future research outlines in this field.
    AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction. (arXiv:2308.16437v1 [cs.IR])
    Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existing datasets only include data for the same type of items from a single scenario. Secondly, multi-modal features are essential in multi-scenario prediction as they address the issue of inconsistent ID encoding between different scenarios. The existing datasets are based on ID features and lack multi-modal features. Third, a large-scale dataset can provide a more reliable evaluation of models, fully reflecting the performance differences between models. The scale of existing datasets is around 100 million, which is relatively small compared to the real-world CTR prediction. To address these limitations, we propose AntM$^{2}$C, a Multi-Scenario Multi-Modal CTR dataset based on industrial data from Alipay. Specifically, AntM$^{2}$C provides the following advantages: 1) It covers CTR data of 5 different types of items, providing insights into the preferences of users for different items, including advertisements, vouchers, mini-programs, contents, and videos. 2) Apart from ID-based features, AntM$^{2}$C also provides 2 multi-modal features, raw text and image features, which can effectively establish connections between items with different IDs. 3) AntM$^{2}$C provides 1 billion CTR data with 200 features, including 200 million users and 6 million items. It is currently the largest-scale CTR dataset available. Based on AntM$^{2}$C, we construct several typical CTR tasks and provide comparisons with baseline methods. The dataset homepage is available at https://www.atecup.cn/home.
    Emergence of Segmentation with Minimalistic White-Box Transformers. (arXiv:2308.16271v1 [cs.CV])
    Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection. Previous works have shown that segmentation properties emerge in vision transformers (ViTs) trained using self-supervised methods such as DINO, but not in those trained on supervised classification tasks. In this study, we probe whether segmentation emerges in transformer-based models solely as a result of intricate self-supervised learning mechanisms, or if the same emergence can be achieved under much broader conditions through proper design of the model architecture. Through extensive experimental results, we demonstrate that when employing a white-box transformer-like architecture known as CRATE, whose design explicitly models and pursues low-dimensional structures in the data distribution, segmentation properties, at both the whole and parts levels, already emerge with a minimalistic supervised training recipe. Layer-wise finer-grained analysis reveals that the emergent properties strongly corroborate the designed mathematical functions of the white-box network. Our results suggest a path to design white-box foundation models that are simultaneously highly performant and mathematically fully interpretable. Code is at \url{https://github.com/Ma-Lab-Berkeley/CRATE}.
    MASA-TCN: Multi-anchor Space-aware Temporal Convolutional Neural Networks for Continuous and Discrete EEG Emotion Recognition. (arXiv:2308.16207v1 [cs.LG])
    Emotion recognition using electroencephalogram (EEG) mainly has two scenarios: classification of the discrete labels and regression of the continuously tagged labels. Although many algorithms were proposed for classification tasks, there are only a few methods for regression tasks. For emotion regression, the label is continuous in time. A natural method is to learn the temporal dynamic patterns. In previous studies, long short-term memory (LSTM) and temporal convolutional neural networks (TCN) were utilized to learn the temporal contextual information from feature vectors of EEG. However, the spatial patterns of EEG were not effectively extracted. To enable the spatial learning ability of TCN towards better regression and classification performances, we propose a novel unified model, named MASA-TCN, for EEG emotion regression and classification tasks. The space-aware temporal layer enables TCN to additionally learn from spatial relations among EEG electrodes. Besides, a novel multi-anchor block with attentive fusion is proposed to learn dynamic temporal dependencies. Experiments on two publicly available datasets show MASA-TCN achieves higher results than the state-of-the-art methods for both EEG emotion regression and classification tasks. The code is available at https://github.com/yi-ding-cs/MASA-TCN.
    A Survey on Privacy in Graph Neural Networks: Attacks, Preservation, and Applications. (arXiv:2308.16375v1 [cs.LG])
    Graph Neural Networks (GNNs) have gained significant attention owing to their ability to handle graph-structured data and the improvement in practical applications. However, many of these models prioritize high utility performance, such as accuracy, with a lack of privacy consideration, which is a major concern in modern society where privacy attacks are rampant. To address this issue, researchers have started to develop privacy-preserving GNNs. Despite this progress, there is a lack of a comprehensive overview of the attacks and the techniques for preserving privacy in the graph domain. In this survey, we aim to address this gap by summarizing the attacks on graph data according to the targeted information, categorizing the privacy preservation techniques in GNNs, and reviewing the datasets and applications that could be used for analyzing/solving privacy issues in GNNs. We also outline potential directions for future research in order to build better privacy-preserving GNNs.
    GRASP: A Goodness-of-Fit Test for Classification Learning. (arXiv:2209.02064v2 [stat.ME] UPDATED)
    Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterizing the fit of the model to the underlying conditional law of labels given the features vector ($Y|X$), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law $Y|X$, and treats that as a black box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form \[ H_0: \mathbb{E}\Big[D_f\Big({\sf Bern}(\eta(X))\|{\sf Bern}(\hat{\eta}(X))\Big)\Big]\leq \tau\,, \] where $D_f$ represents an $f$-divergence function, and $\eta(x)$, $\hat{\eta}(x)$ respectively denote the true and an estimate likelihood for a feature vector $x$ admitting a positive label. We propose a novel test, called \grasp for testing $H_0$, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X \grasp designed for model-X settings where the joint distribution of the features vector is known. Model-X \grasp uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.
    Learning Collaborative Information Dissemination with Graph-based Multi-Agent Reinforcement Learning. (arXiv:2308.16198v1 [cs.LG])
    In modern communication systems, efficient and reliable information dissemination is crucial for supporting critical operations across domains like disaster response, autonomous vehicles, and sensor networks. This paper introduces a Multi-Agent Reinforcement Learning (MARL) approach as a significant step forward in achieving more decentralized, efficient, and collaborative solutions. We propose a Decentralized-POMDP formulation for information dissemination, empowering each agent to independently decide on message forwarding. This constitutes a significant paradigm shift from traditional heuristics based on Multi-Point Relay (MPR) selection. Our approach harnesses Graph Convolutional Reinforcement Learning, employing Graph Attention Networks (GAT) with dynamic attention to capture essential network features. We propose two approaches, L-DGN and HL-DGN, which differ in the information that is exchanged among agents. We evaluate the performance of our decentralized approaches, by comparing them with a widely-used MPR heuristic, and we show that our trained policies are able to efficiently cover the network while bypassing the MPR set selection process. Our approach promises a first step toward bolstering the resilience of real-world broadcast communication infrastructures via learned, collaborative information dissemination.
    RetroBridge: Modeling Retrosynthesis with Markov Bridges. (arXiv:2308.16212v1 [q-bio.QM])
    Retrosynthesis planning is a fundamental challenge in chemistry which aims at designing reaction pathways from commercially available starting materials to a target molecule. Each step in multi-step retrosynthesis planning requires accurate prediction of possible precursor molecules given the target molecule and confidence estimates to guide heuristic search algorithms. We model single-step retrosynthesis planning as a distribution learning problem in a discrete state space. First, we introduce the Markov Bridge Model, a generative framework aimed to approximate the dependency between two intractable discrete distributions accessible via a finite sample of coupled data points. Our framework is based on the concept of a Markov bridge, a Markov process pinned at its endpoints. Unlike diffusion-based methods, our Markov Bridge Model does not need a tractable noise distribution as a sampling proxy and directly operates on the input product molecules as samples from the intractable prior distribution. We then address the retrosynthesis planning problem with our novel framework and introduce RetroBridge, a template-free retrosynthesis modeling approach that achieves state-of-the-art results on standard evaluation benchmarks.
    Deep Video Codec Control. (arXiv:2308.16215v1 [eess.IV])
    Lossy video compression is commonly used when transmitting and storing video data. Unified video codecs (e.g., H.264 or H.265) remain the \emph{de facto} standard, despite the availability of advanced (neural) compression approaches. Transmitting videos in the face of dynamic network bandwidth conditions requires video codecs to adapt to vastly different compression strengths. Rate control modules augment the codec's compression such that bandwidth constraints are satisfied and video distortion is minimized. While, both standard video codes and their rate control modules are developed to minimize video distortion w.r.t. human quality assessment, preserving the downstream performance of deep vision models is not considered. In this paper, we present the first end-to-end learnable deep video codec control considering both bandwidth constraints and downstream vision performance, while not breaking existing standardization. We demonstrate for two common vision tasks (semantic segmentation and optical flow estimation) and on two different datasets that our deep codec control better preserves downstream performance than using 2-pass average bit rate control while meeting dynamic bandwidth constraints and adhering to standardizations.
    A numerical approach for the fractional Laplacian via deep neural networks. (arXiv:2308.16272v1 [math.AP])
    We consider the fractional elliptic problem with Dirichlet boundary conditions on a bounded and convex domain $D$ of $\mathbb{R}^d$, with $d \geq 2$. In this paper, we perform a stochastic gradient descent algorithm that approximates the solution of the fractional problem via Deep Neural Networks. Additionally, we provide four numerical examples to test the efficiency of the algorithm, and each example will be studied for many values of $\alpha \in (1,2)$ and $d \geq 2$.
    Transformers Meet Directed Graphs. (arXiv:2302.00049v3 [cs.LG] UPDATED)
    Transformers were originally proposed as a sequence-to-sequence model for text but have become vital for a wide range of modalities, including images, audio, video, and undirected graphs. However, transformers for directed graphs are a surprisingly underexplored topic, despite their applicability to ubiquitous domains, including source code and logic circuits. In this work, we propose two direction- and structure-aware positional encodings for directed graphs: (1) the eigenvectors of the Magnetic Laplacian - a direction-aware generalization of the combinatorial Laplacian; (2) directional random walk encodings. Empirically, we show that the extra directionality information is useful in various downstream tasks, including correctness testing of sorting networks and source code understanding. Together with a data-flow-centric graph construction, our model outperforms the prior state of the art on the Open Graph Benchmark Code2 relatively by 14.7%.
    A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems. (arXiv:2308.16904v1 [math.NA])
    Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is limited and considers measurement noise in the right-hand side vector, $b$. Unfortunately, in practice, that is not always the case; the coefficient matrix $A$ can also be noisy. In this paper, we analyze the convergence of RK for noisy linear systems when the coefficient matrix, $A$, is corrupted with both additive and multiplicative noise, along with the noisy vector, $b$. In our analyses, the quantity $\tilde R=\| \tilde A^{\dagger} \|_2^2 \|\tilde A \|_F^2$ influences the convergence of RK, where $\tilde A$ represents a noisy version of $A$. We claim that our analysis is robust and realistically applicable, as we do not require information about the noiseless coefficient matrix, $A$, and considering different conditions on noise, we can control the convergence of RK. We substantiate our theoretical findings by performing comprehensive numerical experiments.
    MGNN: Graph Neural Networks Inspired by Distance Geometry Problem. (arXiv:2201.12994v4 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have emerged as a prominent research topic in the field of machine learning. Existing GNN models are commonly categorized into two types: spectral GNNs, which are designed based on polynomial graph filters, and spatial GNNs, which utilize a message-passing scheme as the foundation of the model. For the expressive power and universality of spectral GNNs, a natural approach is to improve the design of basis functions for better approximation ability. As for spatial GNNs, models like Graph Isomorphism Networks (GIN) analyze their expressive power based on Graph Isomorphism Tests. Recently, there have been attempts to establish connections between spatial GNNs and geometric concepts like curvature and cellular sheaves, as well as physical phenomena like oscillators. However, despite the recent progress, there is still a lack of comprehensive analysis regarding the universality of spatial GNNs from the perspectives of geometry and physics. In this paper, we propose MetricGNN (MGNN), a spatial GNN model inspired by the congruent-insensitivity property of classifiers in the classification phase of GNNs. We demonstrate that a GNN model is universal in the spatial domain if it can generate embedding matrices that are congruent to any given embedding matrix. This property is closely related to the Distance Geometry Problem (DGP). Since DGP is an NP-Hard combinatorial optimization problem, we propose optimizing an energy function derived from spring networks and the Multi-Dimensional Scaling (MDS) problem. This approach also allows our model to handle both homophilic and heterophilic graphs. Finally, we propose employing the iteration method to optimize our energy function. We extensively evaluate the effectiveness of our model through experiments conducted on both synthetic and real-world datasets. Our code is available at: https://github.com/GuanyuCui/MGNN.
    Joint Semantic-Native Communication and Inference via Minimal Simplicial Structures. (arXiv:2308.16789v1 [eess.SP])
    In this work, we study the problem of semantic communication and inference, in which a student agent (i.e. mobile device) queries a teacher agent (i.e. cloud sever) to generate higher-order data semantics living in a simplicial complex. Specifically, the teacher first maps its data into a k-order simplicial complex and learns its high-order correlations. For effective communication and inference, the teacher seeks minimally sufficient and invariant semantic structures prior to conveying information. These minimal simplicial structures are found via judiciously removing simplices selected by the Hodge Laplacians without compromising the inference query accuracy. Subsequently, the student locally runs its own set of queries based on a masked simplicial convolutional autoencoder (SCAE) leveraging both local and remote teacher's knowledge. Numerical results corroborate the effectiveness of the proposed approach in terms of improving inference query accuracy under different channel conditions and simplicial structures. Experiments on a coauthorship dataset show that removing simplices by ranking the Laplacian values yields a 85% reduction in payload size without sacrificing accuracy. Joint semantic communication and inference by masked SCAE improves query accuracy by 25% compared to local student based query and 15% compared to remote teacher based query. Finally, incorporating channel semantics is shown to effectively improve inference accuracy, notably at low SNR values.
    Irregular Traffic Time Series Forecasting Based on Asynchronous Spatio-Temporal Graph Convolutional Network. (arXiv:2308.16818v1 [cs.LG])
    Accurate traffic forecasting at intersections governed by intelligent traffic signals is critical for the advancement of an effective intelligent traffic signal control system. However, due to the irregular traffic time series produced by intelligent intersections, the traffic forecasting task becomes much more intractable and imposes three major new challenges: 1) asynchronous spatial dependency, 2) irregular temporal dependency among traffic data, and 3) variable-length sequence to be predicted, which severely impede the performance of current traffic forecasting methods. To this end, we propose an Asynchronous Spatio-tEmporal graph convolutional nEtwoRk (ASeer) to predict the traffic states of the lanes entering intelligent intersections in a future time window. Specifically, by linking lanes via a traffic diffusion graph, we first propose an Asynchronous Graph Diffusion Network to model the asynchronous spatial dependency between the time-misaligned traffic state measurements of lanes. After that, to capture the temporal dependency within irregular traffic state sequence, a learnable personalized time encoding is devised to embed the continuous time for each lane. Then we propose a Transformable Time-aware Convolution Network that learns meta-filters to derive time-aware convolution filters with transformable filter sizes for efficient temporal convolution on the irregular sequence. Furthermore, a Semi-Autoregressive Prediction Network consisting of a state evolution unit and a semiautoregressive predictor is designed to effectively and efficiently predict variable-length traffic state sequences. Extensive experiments on two real-world datasets demonstrate the effectiveness of ASeer in six metrics.
    Conditioning Score-Based Generative Models by Neuro-Symbolic Constraints. (arXiv:2308.16534v1 [cs.LG])
    Score-based and diffusion models have emerged as effective approaches for both conditional and unconditional generation. Still conditional generation is based on either a specific training of a conditional model or classifier guidance, which requires training a noise-dependent classifier, even when the classifier for uncorrupted data is given. We propose an approach to sample from unconditional score-based generative models enforcing arbitrary logical constraints, without any additional training. Firstly, we show how to manipulate the learned score in order to sample from an un-normalized distribution conditional on a user-defined constraint. Then, we define a flexible and numerically stable neuro-symbolic framework for encoding soft logical constraints. Combining these two ingredients we obtain a general, but approximate, conditional sampling algorithm. We further developed effective heuristics aimed at improving the approximation. Finally, we show the effectiveness of our approach for various types of constraints and data: tabular data, images and time series.
    Robust Representation Learning for Unreliable Partial Label Learning. (arXiv:2308.16718v1 [cs.LG])
    Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth. However, this idealistic assumption may not always hold due to potential annotation inaccuracies, meaning the ground-truth may not be present in the candidate label set. This is known as Unreliable Partial Label Learning (UPLL) that introduces an additional complexity due to the inherent unreliability and ambiguity of partial labels, often resulting in a sub-optimal performance with existing methods. To address this challenge, we propose the Unreliability-Robust Representation Learning framework (URRL) that leverages unreliability-robust contrastive learning to help the model fortify against unreliable partial labels effectively. Concurrently, we propose a dual strategy that combines KNN-based candidate label set correction and consistency-regularization-based label disambiguation to refine label quality and enhance the ability of representation learning within the URRL framework. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art PLL methods on various datasets with diverse degrees of unreliability and ambiguity. Furthermore, we provide a theoretical analysis of our approach from the perspective of the expectation maximization (EM) algorithm. Upon acceptance, we pledge to make the code publicly accessible.
    Dynamic nsNet2: Efficient Deep Noise Suppression with Early Exiting. (arXiv:2308.16678v1 [cs.SD])
    Although deep learning has made strides in the field of deep noise suppression, leveraging deep architectures on resource-constrained devices still proved challenging. Therefore, we present an early-exiting model based on nsNet2 that provides several levels of accuracy and resource savings by halting computations at different stages. Moreover, we adapt the original architecture by splitting the information flow to take into account the injected dynamism. We show the trade-offs between performance and computational complexity based on established metrics.
    Curvature-based Pooling within Graph Neural Networks. (arXiv:2308.16516v1 [cs.LG])
    Over-squashing and over-smoothing are two critical issues, that limit the capabilities of graph neural networks (GNNs). While over-smoothing eliminates the differences between nodes making them indistinguishable, over-squashing refers to the inability of GNNs to propagate information over long distances, as exponentially many node states are squashed into fixed-size representations. Both phenomena share similar causes, as both are largely induced by the graph topology. To mitigate these problems in graph classification tasks, we propose CurvPool, a novel pooling method. CurvPool exploits the notion of curvature of a graph to adaptively identify structures responsible for both over-smoothing and over-squashing. By clustering nodes based on the Balanced Forman curvature, CurvPool constructs a graph with a more suitable structure, allowing deeper models and the combination of distant information. We compare it to other state-of-the-art pooling approaches and establish its competitiveness in terms of classification accuracy, computational complexity, and flexibility. CurvPool outperforms several comparable methods across all considered tasks. The most consistent results are achieved by pooling densely connected clusters using the sum aggregation, as this allows additional information about the size of each pool.
    A Causal Discovery Approach To Learn How Urban Form Shapes Sustainable Mobility Across Continents. (arXiv:2308.16599v1 [cs.LG])
    Global sustainability requires low-carbon urban transport systems, shaped by adequate infrastructure, deployment of low-carbon transport modes and shifts in travel behavior. To adequately implement alterations in infrastructure, it's essential to grasp the location-specific cause-and-effect mechanisms that the constructed environment has on travel. Yet, current research falls short in representing causal relationships between the 6D urban form variables and travel, generalizing across different regions, and modeling urban form effects at high spatial resolution. Here, we address all three gaps by utilizing a causal discovery and an explainable machine learning framework to detect urban form effects on intra-city travel based on high-resolution mobility data of six cities across three continents. We show that both distance to city center, demographics and density indirectly affect other urban form features. By considering the causal relationships, we find that location-specific influences align across cities, yet vary in magnitude. In addition, the spread of the city and the coverage of jobs across the city are the strongest determinants of travel-related emissions, highlighting the benefits of compact development and associated benefits. Differences in urban form effects across the cities call for a more holistic definition of 6D measures. Our work is a starting point for location-specific analysis of urban form effects on mobility behavior using causal discovery approaches, which is highly relevant for city planners and municipalities across continents.
    US-SFNet: A Spatial-Frequency Domain-based Multi-branch Network for Cervical Lymph Node Lesions Diagnoses in Ultrasound Images. (arXiv:2308.16738v1 [eess.IV])
    Ultrasound imaging serves as a pivotal tool for diagnosing cervical lymph node lesions. However, the diagnoses of these images largely hinge on the expertise of medical practitioners, rendering the process susceptible to misdiagnoses. Although rapidly developing deep learning has substantially improved the diagnoses of diverse ultrasound images, there remains a conspicuous research gap concerning cervical lymph nodes. The objective of our work is to accurately diagnose cervical lymph node lesions by leveraging a deep learning model. To this end, we first collected 3392 images containing normal lymph nodes, benign lymph node lesions, malignant primary lymph node lesions, and malignant metastatic lymph node lesions. Given that ultrasound images are generated by the reflection and scattering of sound waves across varied bodily tissues, we proposed the Conv-FFT Block. It integrates convolutional operations with the fast Fourier transform to more astutely model the images. Building upon this foundation, we designed a novel architecture, named US-SFNet. This architecture not only discerns variances in ultrasound images from the spatial domain but also adeptly captures microstructural alterations across various lesions in the frequency domain. To ascertain the potential of US-SFNet, we benchmarked it against 12 popular architectures through five-fold cross-validation. The results show that US-SFNet is SOTA and can achieve 92.89% accuracy, 90.46% precision, 89.95% sensitivity and 97.49% specificity, respectively.
    CktGNN: Circuit Graph Neural Network for Electronic Design Automation. (arXiv:2308.16406v1 [cs.LG])
    The electronic design automation of analog circuits has been a longstanding challenge in the integrated circuit field due to the huge design space and complex design trade-offs among circuit specifications. In the past decades, intensive research efforts have mostly been paid to automate the transistor sizing with a given circuit topology. By recognizing the graph nature of circuits, this paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing based on the encoder-dependent optimization subroutines. Particularly, CktGNN encodes circuit graphs using a two-level GNN framework (of nested GNN) where circuits are represented as combinations of subgraphs in a known subgraph basis. In this way, it significantly improves design efficiency by reducing the number of subgraphs to perform message passing. Nonetheless, another critical roadblock to advancing learning-assisted circuit design automation is a lack of public benchmarks to perform canonical assessment and reproducible research. To tackle the challenge, we introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers with carefully-extracted circuit specifications. OCB is also equipped with communicative circuit generation and evaluation capabilities such that it can help to generalize CktGNN to design various analog circuits by producing corresponding datasets. Experiments on OCB show the extraordinary advantages of CktGNN through representation-based optimization frameworks over other recent powerful GNN baselines and human experts' manual designs. Our work paves the way toward a learning-based open-sourced design automation for analog circuits. Our source code is available at \url{https://github.com/zehao-dong/CktGNN}.
    DECODE: DilatEd COnvolutional neural network for Detecting Extreme-mass-ratio inspirals. (arXiv:2308.16422v1 [astro-ph.IM])
    The detection of Extreme Mass Ratio Inspirals (EMRIs) is intricate due to their complex waveforms, extended duration, and low signal-to-noise ratio (SNR), making them more challenging to be identified compared to compact binary coalescences. While matched filtering-based techniques are known for their computational demands, existing deep learning-based methods primarily handle time-domain data and are often constrained by data duration and SNR. In addition, most existing work ignores time-delay interferometry (TDI) and applies the long-wavelength approximation in detector response calculations, thus limiting their ability to handle laser frequency noise. In this study, we introduce DECODE, an end-to-end model focusing on EMRI signal detection by sequence modeling in the frequency domain. Centered around a dilated causal convolutional neural network, trained on synthetic data considering TDI-1.5 detector response, DECODE can efficiently process a year's worth of multichannel TDI data with an SNR of around 50. We evaluate our model on 1-year data with accumulated SNR ranging from 50 to 120 and achieve a true positive rate of 96.3% at a false positive rate of 1%, keeping an inference time of less than 0.01 seconds. With the visualization of three showcased EMRI signals for interpretability and generalization, DECODE exhibits strong potential for future space-based gravitational wave data analyses.
    Learning Diverse Features in Vision Transformers for Improved Generalization. (arXiv:2308.16274v1 [cs.CV])
    Deep learning models often rely only on a small set of features even when there is a rich set of predictive signals in the training data. This makes models brittle and sensitive to distribution shifts. In this work, we first examine vision transformers (ViTs) and find that they tend to extract robust and spurious features with distinct attention heads. As a result of this modularity, their performance under distribution shifts can be significantly improved at test time by pruning heads corresponding to spurious features, which we demonstrate using an "oracle selection" on validation data. Second, we propose a method to further enhance the diversity and complementarity of the learned features by encouraging orthogonality of the attention heads' input gradients. We observe improved out-of-distribution performance on diagnostic benchmarks (MNIST-CIFAR, Waterbirds) as a consequence of the enhanced diversity of features and the pruning of undesirable heads.
    Emoji Promotes Developer Participation and Issue Resolution on GitHub. (arXiv:2308.16360v1 [cs.CY])
    Although remote working is increasingly adopted during the pandemic, many are concerned by the low-efficiency in the remote working. Missing in text-based communication are non-verbal cues such as facial expressions and body language, which hinders the effective communication and negatively impacts the work outcomes. Prevalent on social media platforms, emojis, as alternative non-verbal cues, are gaining popularity in the virtual workspaces well. In this paper, we study how emoji usage influences developer participation and issue resolution in virtual workspaces. To this end, we collect GitHub issues for a one-year period and apply causal inference techniques to measure the causal effect of emojis on the outcome of issues, controlling for confounders such as issue content, repository, and author information. We find that emojis can significantly reduce the resolution time of issues and attract more user participation. We also compare the heterogeneous effect on different types of issues. These findings deepen our understanding of the developer communities, and they provide design implications on how to facilitate interactions and broaden developer participation.
    A Policy Adaptation Method for Implicit Multitask Reinforcement Learning Problems. (arXiv:2308.16471v1 [cs.RO])
    In dynamic motion generation tasks, including contact and collisions, small changes in policy parameters can lead to extremely different returns. For example, in soccer, the ball can fly in completely different directions with a similar heading motion by slightly changing the hitting position or the force applied to the ball or when the friction of the ball varies. However, it is difficult to imagine that completely different skills are needed for heading a ball in different directions. In this study, we proposed a multitask reinforcement learning algorithm for adapting a policy to implicit changes in goals or environments in a single motion category with different reward functions or physical parameters of the environment. We evaluated the proposed method on the ball heading task using a monopod robot model. The results showed that the proposed method can adapt to implicit changes in the goal positions or the coefficients of restitution of the ball, whereas the standard domain randomization approach cannot cope with different task settings.
    Classification of Anomalies in Telecommunication Network KPI Time Series. (arXiv:2308.16279v1 [cs.LG])
    The increasing complexity and scale of telecommunication networks have led to a growing interest in automated anomaly detection systems. However, the classification of anomalies detected on network Key Performance Indicators (KPI) has received less attention, resulting in a lack of information about anomaly characteristics and classification processes. To address this gap, this paper proposes a modular anomaly classification framework. The framework assumes separate entities for the anomaly classifier and the detector, allowing for a distinct treatment of anomaly detection and classification tasks on time series. The objectives of this study are (1) to develop a time series simulator that generates synthetic time series resembling real-world network KPI behavior, (2) to build a detection model to identify anomalies in the time series, (3) to build classification models that accurately categorize detected anomalies into predefined classes (4) to evaluate the classification framework performance on simulated and real-world network KPI time series. This study has demonstrated the good performance of the anomaly classification models trained on simulated anomalies when applied to real-world network time series data.
    Deep Inductive Logic Programming meets Reinforcement Learning. (arXiv:2308.16210v1 [cs.LG])
    One approach to explaining the hierarchical levels of understanding within a machine learning model is the symbolic method of inductive logic programming (ILP), which is data efficient and capable of learning first-order logic rules that can entail data behaviour. A differentiable extension to ILP, so-called differentiable Neural Logic (dNL) networks, are able to learn Boolean functions as their neural architecture includes symbolic reasoning. We propose an application of dNL in the field of Relational Reinforcement Learning (RRL) to address dynamic continuous environments. This represents an extension of previous work in applying dNL-based ILP in RRL settings, as our proposed model updates the architecture to enable it to solve problems in continuous RL environments. The goal of this research is to improve upon current ILP methods for use in RRL by incorporating non-linear continuous predicates, allowing RRL agents to reason and make decisions in dynamic and continuous environments.  ( 2 min )
    Symmetry Preservation in Hamiltonian Systems: Simulation and Learning. (arXiv:2308.16331v1 [math-ph])
    This work presents a general geometric framework for simulating and learning the dynamics of Hamiltonian systems that are invariant under a Lie group of transformations. This means that a group of symmetries is known to act on the system respecting its dynamics and, as a consequence, Noether's Theorem, conserved quantities are observed. We propose to simulate and learn the mappings of interest through the construction of $G$-invariant Lagrangian submanifolds, which are pivotal objects in symplectic geometry. A notable property of our constructions is that the simulated/learned dynamics also preserves the same conserved quantities as the original system, resulting in a more faithful surrogate of the original dynamics than non-symmetry aware methods, and in a more accurate predictor of non-observed trajectories. Furthermore, our setting is able to simulate/learn not only Hamiltonian flows, but any Lie group-equivariant symplectic transformation. Our designs leverage pivotal techniques and concepts in symplectic geometry and geometric mechanics: reduction theory, Noether's Theorem, Lagrangian submanifolds, momentum mappings, and coisotropic reduction among others. We also present methods to learn Poisson transformations while preserving the underlying geometry and how to endow non-geometric integrators with geometric properties. Thus, this work presents a novel attempt to harness the power of symplectic and Poisson geometry towards simulating and learning problems.  ( 2 min )
    SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills. (arXiv:2308.16369v1 [cs.LG])
    Large Language Model (LLM) inference consists of two distinct phases - prefill phase which processes the input prompt and decode phase which generates output tokens autoregressively. While the prefill phase effectively saturates GPU compute at small batch sizes, the decode phase results in low compute utilization as it generates one token at a time per request. The varying prefill and decode times also lead to imbalance across micro-batches when using pipeline parallelism, resulting in further inefficiency due to bubbles. We present SARATHI to address these challenges. SARATHI employs chunked-prefills, which splits a prefill request into equal sized chunks, and decode-maximal batching, which constructs a batch using a single prefill chunk and populates the remaining slots with decodes. During inference, the prefill chunk saturates GPU compute, while the decode requests 'piggyback' and cost up to an order of magnitude less compared to a decode-only batch. Chunked-prefills allows constructing multiple decode-maximal batches from a single prefill request, maximizing coverage of decodes that can piggyback. Furthermore, the uniform compute design of these batches ameliorates the imbalance between micro-batches, significantly reducing pipeline bubbles. Our techniques yield significant improvements in inference performance across models and hardware. For the LLaMA-13B model on A6000 GPU, SARATHI improves decode throughput by up to 10x, and accelerates end-to-end throughput by up to 1.33x. For LLaMa-33B on A100 GPU, we achieve 1.25x higher end-to-end-throughput and up to 4.25x higher decode throughput. When used with pipeline parallelism on GPT-3, SARATHI reduces bubbles by 6.29x, resulting in an end-to-end throughput improvement of 1.91x.  ( 3 min )
  • Open

    Invertible normalizing flow neural networks by JKO scheme. (arXiv:2212.14424v2 [stat.ML] UPDATED)
    Normalizing flow is a class of deep generative models for efficient sampling and density estimation. In practice, the flow often appears as a chain of invertible neural network blocks; to facilitate training, existing works have regularized flow trajectories and designed special network architectures. The current paper develops a neural ODE flow network inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which allows efficient block-wise training of the residual blocks without sampling SDE trajectories or inner loops of score matching or variational learning. As the JKO scheme unfolds the dynamic of gradient flow, the proposed model naturally stacks residual network blocks one by one, reducing the memory load and difficulty in performing end-to-end deep flow network training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the trajectory in probability space, which improves the model training efficiency and accuracy in practice. Using numerical experiments with synthetic and real data, we show that the proposed JKO-iFlow model achieves similar or better performance in generating new samples compared with the existing flow and diffusion models at a significantly reduced computational and memory cost.
    Leveraging Image-based Generative Adversarial Networks for Time Series Generation. (arXiv:2112.08060v2 [cs.LG] UPDATED)
    Generative models for images have gained significant attention in computer vision and natural language processing due to their ability to generate realistic samples from complex data distributions. To leverage the advances of image-based generative models for the time series domain, we propose a two-dimensional image representation for time series, the Extended Intertemporal Return Plot (XIRP). Our approach captures the intertemporal time series dynamics in a scale-invariant and invertible way, reducing training time and improving sample quality. We benchmark synthetic XIRPs obtained by an off-the-shelf Wasserstein GAN with gradient penalty (WGAN-GP) to other image representations and models regarding similarity and predictive ability metrics. Our novel, validated image representation for time series consistently and significantly outperforms a state-of-the-art RNN-based generative model regarding predictive ability. Further, we introduce an improved stochastic inversion to substantially improve simulation quality regardless of the representation and provide the prospect of transfer potentials in other domains.
    Least Squares Maximum and Weighted Generalization-Memorization Machines. (arXiv:2308.16456v1 [stat.ML])
    In this paper, we propose a new way of remembering by introducing a memory influence mechanism for the least squares support vector machine (LSSVM). Without changing the equation constraints of the original LSSVM, this mechanism, allows an accurate partitioning of the training set without overfitting. The maximum memory impact model (MIMM) and the weighted impact memory model (WIMM) are then proposed. It is demonstrated that these models can be degraded to the LSSVM. Furthermore, we propose some different memory impact functions for the MIMM and WIMM. The experimental results show that that our MIMM and WIMM have better generalization performance compared to the LSSVM and significant advantage in time cost compared to other memory models.
    Temporal-spatial model via Trend Filtering. (arXiv:2308.16172v2 [stat.ME] UPDATED)
    This research focuses on the estimation of a non-parametric regression function designed for data with simultaneous time and space dependencies. In such a context, we study the Trend Filtering, a nonparametric estimator introduced by \cite{mammen1997locally} and \cite{rudin1992nonlinear}. For univariate settings, the signals we consider are assumed to have a kth weak derivative with bounded total variation, allowing for a general degree of smoothness. In the multivariate scenario, we study a $K$-Nearest Neighbor fused lasso estimator as in \cite{padilla2018adaptive}, employing an ADMM algorithm, suitable for signals with bounded variation that adhere to a piecewise Lipschitz continuity criterion. By aligning with lower bounds, the minimax optimality of our estimators is validated. A unique phase transition phenomenon, previously uncharted in Trend Filtering studies, emerges through our analysis. Both Simulation studies and real data applications underscore the superior performance of our method when compared with established techniques in the existing literature.
    StyleDiff: Attribute Comparison Between Unlabeled Datasets in Latent Disentangled Space. (arXiv:2303.05102v2 [stat.ML] UPDATED)
    One major challenge in machine learning applications is coping with mismatches between the datasets used in the development and those obtained in real-world applications. These mismatches may lead to inaccurate predictions and errors, resulting in poor product quality and unreliable systems. In this study, we propose StyleDiff to inform developers of the differences between the two datasets for the steady development of machine learning systems. Using disentangled image spaces obtained from recently proposed generative models, StyleDiff compares the two datasets by focusing on attributes in the images and provides an easy-to-understand analysis of the differences between the datasets. The proposed StyleDiff performs in $O (d N\log N)$, where $N$ is the size of the datasets and $d$ is the number of attributes, enabling the application to large datasets. We demonstrate that StyleDiff accurately detects differences between datasets and presents them in an understandable format using, for example, driving scenes datasets.
    Biclustering Methods via Sparse Penalty. (arXiv:2308.14388v2 [stat.ML] UPDATED)
    In this paper, we first reviewed several biclustering methods that are used to identify the most significant clusters in gene expression data. Here we mainly focused on the SSVD(sparse SVD) method and tried a new sparse penalty named "Prenet penalty" which has been used only in factor analysis to gain sparsity. Then in the simulation study, we tried different types of generated datasets (with different sparsity and dimension) and tried 1-layer approximation then for k-layers which shows the mixed Prenet penalty is very effective for non-overlapped data. Finally, we used some real gene expression data to show the behavior of our methods.
    Multi-Response Heteroscedastic Gaussian Process Models and Their Inference. (arXiv:2308.15370v2 [stat.ML] UPDATED)
    Despite the widespread utilization of Gaussian process models for versatile nonparametric modeling, they exhibit limitations in effectively capturing abrupt changes in function smoothness and accommodating relationships with heteroscedastic errors. Addressing these shortcomings, the heteroscedastic Gaussian process (HeGP) regression seeks to introduce flexibility by acknowledging the variability of residual variances across covariates in the regression model. In this work, we extend the HeGP concept, expanding its scope beyond regression tasks to encompass classification and state-space models. To achieve this, we propose a novel framework where the Gaussian process is coupled with a covariate-induced precision matrix process, adopting a mixture formulation. This approach enables the modeling of heteroscedastic covariance functions across covariates. To mitigate the computational challenges posed by sampling, we employ variational inference to approximate the posterior and facilitate posterior predictive modeling. Additionally, our training process leverages an EM algorithm featuring closed-form M-step updates to efficiently evaluate the heteroscedastic covariance function. A notable feature of our model is its consistent performance on multivariate responses, accommodating various types (continuous or categorical) seamlessly. Through a combination of simulations and real-world applications in climatology, we illustrate the model's prowess and advantages. By overcoming the limitations of traditional Gaussian process models, our proposed framework offers a robust and versatile tool for a wide array of applications.
    Karhunen-Lo\`eve Data Imputation in High Contrast Imaging. (arXiv:2308.16912v1 [astro-ph.IM])
    Detection and characterization of extended structures is a crucial goal in high contrast imaging. However, these structures face challenges in data reduction, leading to over-subtraction from speckles and self-subtraction with most existing methods. Iterative post-processing methods offer promising results, but their integration into existing pipelines is hindered by selective algorithms, high computational cost, and algorithmic regularization. To address this for reference differential imaging (RDI), here we propose the data imputation concept to Karhunen-Lo\`eve transform (DIKL) by modifying two steps in the standard Karhunen-Lo\`eve image projection (KLIP) method. Specifically, we partition an image to two matrices: an anchor matrix which focuses only on the speckles to obtain the DIKL coefficients, and a boat matrix which focuses on the regions of astrophysical interest for speckle removal using DIKL components. As an analytical approach, DIKL achieves high-quality results with significantly reduced computational cost (~3 orders of magnitude less than iterative methods). Being a derivative method of KLIP, DIKL is seamlessly integrable into high contrast imaging pipelines for RDI observations.  ( 3 min )
    Approximate Bayesian inference from noisy likelihoods with Gaussian process emulated MCMC. (arXiv:2104.03942v2 [stat.ME] UPDATED)
    We present a framework for approximate Bayesian inference when only a limited number of noisy log-likelihood evaluations can be obtained due to computational constraints, which is becoming increasingly common for applications of complex models. We model the log-likelihood function using a Gaussian process (GP) and the main methodological innovation is to apply this model to emulate the progression that an exact Metropolis-Hastings (MH) sampler would take if it was applicable. Informative log-likelihood evaluation locations are selected using a sequential experimental design strategy until the MH accept/reject decision is done accurately enough according to the GP model. The resulting approximate sampler is conceptually simple and sample-efficient. It is also more robust to violations of GP modelling assumptions compared with earlier, related "Bayesian optimisation-like" methods tailored for Bayesian inference. We discuss some theoretical aspects and various interpretations of the resulting approximate MH sampler, and demonstrate its benefits in the context of Bayesian and generalised Bayesian likelihood-free inference for simulator-based statistical models.  ( 2 min )
    Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics. (arXiv:2308.16680v1 [stat.ML])
    We propose to apply several gradient estimation techniques to enable the differentiation of programs with discrete randomness in High Energy Physics. Such programs are common in High Energy Physics due to the presence of branching processes and clustering-based analysis. Thus differentiating such programs can open the way for gradient based optimization in the context of detector design optimization, simulator tuning, or data analysis and reconstruction optimization. We discuss several possible gradient estimation strategies, including the recent Stochastic AD method, and compare them in simplified detector design experiments. In doing so we develop, to the best of our knowledge, the first fully differentiable branching program.  ( 2 min )
    On-Demand Communication for Asynchronous Multi-Agent Bandits. (arXiv:2302.07446v2 [cs.LG] UPDATED)
    This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate asynchronously -- agent pull times and rates are unknown, irregular, and heterogeneous -- and face the same instance of a K-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose ODC, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. ODC is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate ODC into the natural extensions of UCB and AAE algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.  ( 2 min )
    A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks. (arXiv:2304.14994v2 [cs.LG] UPDATED)
    Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic forgetting impairs the applicability of this approach to initial value problems (IVPs). In an alternative local-in-time approach, the optimization problem can be converted into an ordinary differential equation (ODE) on the network parameters and the solution propagated forward in time; however, we demonstrate that current methods based on this approach suffer from two key issues. First, following the ODE produces an uncontrolled growth in the conditioning of the problem, ultimately leading to unacceptably large numerical errors. Second, as the ODE methods scale cubically with the number of model parameters, they are restricted to small neural networks, significantly limiting their ability to represent intricate PDE initial conditions and solutions. Building on these insights, we develop Neural IVP, an ODE based IVP solver which prevents the network from getting ill-conditioned and runs in time linear in the number of parameters, enabling us to evolve the dynamics of challenging PDEs with neural networks.  ( 3 min )
    Calibrated Explanations for Regression. (arXiv:2308.16245v1 [cs.LG])
    Artificial Intelligence (AI) is often an integral part of modern decision support systems (DSSs). The best-performing predictive models used in AI-based DSSs lack transparency. Explainable Artificial Intelligence (XAI) aims to create AI systems that can explain their rationale to human users. Local explanations in XAI can provide information about the causes of individual predictions in terms of feature importance. However, a critical drawback of existing local explanation methods is their inability to quantify the uncertainty associated with a feature's importance. This paper introduces an extension of a feature importance explanation method, Calibrated Explanations (CE), previously only supporting classification, with support for standard regression and probabilistic regression, i.e., the probability that the target is above an arbitrary threshold. The extension for regression keeps all the benefits of CE, such as calibration of the prediction from the underlying model with confidence intervals, uncertainty quantification of feature importance, and allows both factual and counterfactual explanations. CE for standard regression provides fast, reliable, stable, and robust explanations. CE for probabilistic regression provides an entirely new way of creating probabilistic explanations from any ordinary regression model and with a dynamic selection of thresholds. The performance of CE for probabilistic regression regarding stability and speed is comparable to LIME. The method is model agnostic with easily understood conditional rules. An implementation in Python is freely available on GitHub and for installation using pip making the results in this paper easily replicable.  ( 2 min )
    Multiple Augmented Reduced Rank Regression for Pan-Cancer Analysis. (arXiv:2308.16333v1 [stat.ME])
    Statistical approaches that successfully combine multiple datasets are more powerful, efficient, and scientifically informative than separate analyses. To address variation architectures correctly and comprehensively for high-dimensional data across multiple sample sets (i.e., cohorts), we propose multiple augmented reduced rank regression (maRRR), a flexible matrix regression and factorization method to concurrently learn both covariate-driven and auxiliary structured variation. We consider a structured nuclear norm objective that is motivated by random matrix theory, in which the regression or factorization terms may be shared or specific to any number of cohorts. Our framework subsumes several existing methods, such as reduced rank regression and unsupervised multi-matrix factorization approaches, and includes a promising novel approach to regression and factorization of a single dataset (aRRR) as a special case. Simulations demonstrate substantial gains in power from combining multiple datasets, and from parsimoniously accounting for all structured variation. We apply maRRR to gene expression data from multiple cancer types (i.e., pan-cancer) from TCGA, with somatic mutations as covariates. The method performs well with respect to prediction and imputation of held-out data, and provides new insights into mutation-driven and auxiliary variation that is shared or specific to certain cancer types.  ( 2 min )
    Forecasting Emergency Department Crowding with Advanced Machine Learning Models and Multivariable Input. (arXiv:2308.16544v1 [cs.LG])
    Emergency department (ED) crowding is a significant threat to patient safety and it has been repeatedly associated with increased mortality. Forecasting future service demand has the potential patient outcomes. Despite active research on the subject, several gaps remain: 1) proposed forecasting models have become outdated due to quick influx of advanced machine learning models (ML), 2) amount of multivariable input data has been limited and 3) discrete performance metrics have been rarely reported. In this study, we document the performance of a set of advanced ML models in forecasting ED occupancy 24 hours ahead. We use electronic health record data from a large, combined ED with an extensive set of explanatory variables, including the availability of beds in catchment area hospitals, traffic data from local observation stations, weather variables, etc. We show that N-BEATS and LightGBM outpeform benchmarks with 11 % and 9 % respective improvements and that DeepAR predicts next day crowding with an AUC of 0.76 (95 % CI 0.69-0.84). To the best of our knowledge, this is the first study to document the superiority of LightGBM and N-BEATS over statistical benchmarks in the context of ED forecasting.  ( 2 min )
    Hypergraph Structure Inference From Data Under Smoothness Prior. (arXiv:2308.14172v2 [cs.LG] UPDATED)
    Hypergraphs are important for processing data with higher-order relationships involving more than two entities. In scenarios where explicit hypergraphs are not readily available, it is desirable to infer a meaningful hypergraph structure from the node features to capture the intrinsic relations within the data. However, existing methods either adopt simple pre-defined rules that fail to precisely capture the distribution of the potential hypergraph structure, or learn a mapping between hypergraph structures and node features but require a large amount of labelled data, i.e., pre-existing hypergraph structures, for training. Both restrict their applications in practical scenarios. To fill this gap, we propose a novel smoothness prior that enables us to design a method to infer the probability for each potential hyperedge without labelled data as supervision. The proposed prior indicates features of nodes in a hyperedge are highly correlated by the features of the hyperedge containing them. We use this prior to derive the relation between the hypergraph structure and the node features via probabilistic modelling. This allows us to develop an unsupervised inference method to estimate the probability for each potential hyperedge via solving an optimisation problem that has an analytical solution. Experiments on both synthetic and real-world data demonstrate that our method can learn meaningful hypergraph structures from data more efficiently than existing hypergraph structure inference methods.  ( 3 min )
    A stochastic block model for community detection in attributed networks. (arXiv:2308.16382v1 [cs.SI])
    Community detection is an important content in complex network analysis. The existing community detection methods in attributed networks mostly focus on only using network structure, while the methods of integrating node attributes is mainly for the traditional community structures, and cannot detect multipartite structures and mixture structures in network. In addition, the model-based community detection methods currently proposed for attributed networks do not fully consider unique topology information of nodes, such as betweenness centrality and clustering coefficient. Therefore, a stochastic block model that integrates betweenness centrality and clustering coefficient of nodes for community detection in attributed networks, named BCSBM, is proposed in this paper. Different from other generative models for attributed networks, the generation process of links and attributes in BCSBM model follows the Poisson distribution, and the probability between community is considered based on the stochastic block model. Moreover, the betweenness centrality and clustering coefficient of nodes are introduced into the process of links and attributes generation. Finally, the expectation maximization algorithm is employed to estimate the parameters of the BCSBM model, and the node-community memberships is obtained through the hard division process, so the community detection is completed. By experimenting on six real-work networks containing different network structures, and comparing with the community detection results of five algorithms, the experimental results show that the BCSBM model not only inherits the advantages of the stochastic block model and can detect various network structures, but also has good data fitting ability due to introducing the betweenness centrality and clustering coefficient of nodes. Overall, the performance of this model is superior to other five compared algorithms.  ( 3 min )
    GRASP: A Goodness-of-Fit Test for Classification Learning. (arXiv:2209.02064v2 [stat.ME] UPDATED)
    Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterizing the fit of the model to the underlying conditional law of labels given the features vector ($Y|X$), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law $Y|X$, and treats that as a black box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form \[ H_0: \mathbb{E}\Big[D_f\Big({\sf Bern}(\eta(X))\|{\sf Bern}(\hat{\eta}(X))\Big)\Big]\leq \tau\,, \] where $D_f$ represents an $f$-divergence function, and $\eta(x)$, $\hat{\eta}(x)$ respectively denote the true and an estimate likelihood for a feature vector $x$ admitting a positive label. We propose a novel test, called \grasp for testing $H_0$, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X \grasp designed for model-X settings where the joint distribution of the features vector is known. Model-X \grasp uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.  ( 2 min )
    Everything, Everywhere All in One Evaluation: Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness. (arXiv:2308.16681v1 [stat.ML])
    A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. When designed well, these systems promise more objective decisions while saving large amounts of resources and freeing up human time. However, when ADM systems are not designed well, they can lead to unfair decisions which discriminate against societal groups. The downstream effects of ADMs critically depend on the decisions made during the systems' design and implementation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these design decisions are made implicitly, without knowing exactly how they will influence the final system. It is therefore important to make explicit the decisions made during the design of ADM systems and understand how these decisions affect the fairness of the resulting system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit design decisions into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible "universes" of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand variability and robustness of algorithmic fairness using an exemplary case study of predicting public health coverage of vulnerable populations for potential interventions. Our results illustrate how decisions during the design of a machine learning system can have surprising effects on its fairness and how to detect these effects using multiverse analysis.  ( 3 min )
    Generative Sliced MMD Flows with Riesz Kernels. (arXiv:2305.11463v2 [cs.LG] UPDATED)
    Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \Vert x-y\Vert^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N^2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.  ( 2 min )
    Learning Channel Importance for High Content Imaging with Interpretable Deep Input Channel Mixing. (arXiv:2308.16637v1 [cs.CV])
    Uncovering novel drug candidates for treating complex diseases remain one of the most challenging tasks in early discovery research. To tackle this challenge, biopharma research established a standardized high content imaging protocol that tags different cellular compartments per image channel. In order to judge the experimental outcome, the scientist requires knowledge about the channel importance with respect to a certain phenotype for decoding the underlying biology. In contrast to traditional image analysis approaches, such experiments are nowadays preferably analyzed by deep learning based approaches which, however, lack crucial information about the channel importance. To overcome this limitation, we present a novel approach which utilizes multi-spectral information of high content images to interpret a certain aspect of cellular biology. To this end, we base our method on image blending concepts with alpha compositing for an arbitrary number of channels. More specifically, we introduce DCMIX, a lightweight, scaleable and end-to-end trainable mixing layer which enables interpretable predictions in high content imaging while retaining the benefits of deep learning based methods. We employ an extensive set of experiments on both MNIST and RXRX1 datasets, demonstrating that DCMIX learns the biologically relevant channel importance without scarifying prediction performance.  ( 2 min )
    High Dimensional Time Series Regression Models: Applications to Statistical Learning Methods. (arXiv:2308.16192v1 [econ.EM])
    These lecture notes provide an overview of existing methodologies and recent developments for estimation and inference with high dimensional time series regression models. First, we present main limit theory results for high dimensional dependent data which is relevant to covariance matrix structures as well as to dependent time series sequences. Second, we present main aspects of the asymptotic theory related to time series regression models with many covariates. Third, we discuss various applications of statistical learning methodologies for time series analysis purposes.  ( 2 min )
    Information Theoretically Optimal Sample Complexity of Learning Dynamical Directed Acyclic Graphs. (arXiv:2308.16859v1 [stat.ML])
    In this article, the optimal sample complexity of learning the underlying interaction/dependencies of a Linear Dynamical System (LDS) over a Directed Acyclic Graph (DAG) is studied. The sample complexity of learning a DAG's structure is well-studied for static systems, where the samples of nodal states are independent and identically distributed (i.i.d.). However, such a study is less explored for DAGs with dynamical systems, where the nodal states are temporally correlated. We call such a DAG underlying an LDS as \emph{dynamical} DAG (DDAG). In particular, we consider a DDAG where the nodal dynamics are driven by unobserved exogenous noise sources that are wide-sense stationary (WSS) in time but are mutually uncorrelated, and have the same {power spectral density (PSD)}. Inspired by the static settings, a metric and an algorithm based on the PSD matrix of the observed time series are proposed to reconstruct the DDAG. The equal noise PSD assumption can be relaxed such that identifiability conditions for DDAG reconstruction are not violated. For the LDS with WSS (sub) Gaussian exogenous noise sources, it is shown that the optimal sample complexity (or length of state trajectory) needed to learn the DDAG is $n=\Theta(q\log(p/q))$, where $p$ is the number of nodes and $q$ is the maximum number of parents per node. To prove the sample complexity upper bound, a concentration bound for the PSD estimation is derived, under two different sampling strategies. A matching min-max lower bound using generalized Fano's inequality also is provided, thus showing the order optimality of the proposed algorithm.  ( 3 min )
    On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint. (arXiv:2308.16425v1 [cs.LG])
    Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions.  ( 2 min )

  • Open

    [D][R] Best way to upsample features in a neural network
    Hi fellow computer scientists, ​ 1) I have been wondering if there is a preferred way to upsample features. I though about 3 options: ​ 1.a) Upsample Layer + Conv Layer ​ 1.b) Transposed Conv Layer + Conv Layer ​ 1.c) PixelShuffle Layer + Conv Layer ​ 2) Also, considering option 1.c, should the Conv layer multiply the number of PixelShuffle output features by the scale factor because PixelShuffle does reduce the number of output features? i.e. I have a tensor of dims (B, C, W, H, D) and with shape (1, 60, 64, 64, 64). After the Pixel-shuffle with an upscale factor of 4 I get the tensor of shape (1, 15, 256, 256, 256). Afterwards the following Conv layer should output a tensor like: 2.a) (1, 15, 256, 256, 256), where in_channels=15 and out_channels=15 2.b) (1, 60, 256, 256, 256), where in_channels=15 and out_channels=60 Note the second option reinstates the number of input features. ​ 3) I have an additional question that can happens in both 1.a, 1.b and 1.c options. Imagine I need to upsample my features by a factor of 8. 3.a) Is it preferred to have multiple upsample blocks (Upsample Layer + Conv layer), where the upsample layers have a scale factor of 2, thus for this example we would have 3 upsample blocks (2 ** 3 = 8). 3.b) Have only one upsample block where the Upsample layer has the full scale factor desired and it is then followed by one Conv layer. ​ Thank you all :) submitted by /u/Christs_Elite [link] [comments]  ( 9 min )
    [D] Am I the only one finding this a bit upsetting?
    Hello everyone, In the process of writing up a literature review for my master's thesis, I wanted to cover the impact of ReLU on the field which was significant. When looking for an original paper I came across this paper/report: https://arxiv.org/abs/1803.08375. There isn't anything special about this work and as a matter of fact, I was surprised that it has thousands of citations (2974 at the moment of writing this post according to Google Scholar). Given this and that this work is not an original ReLU paper but more of a file documenting an implementation of it for a particular setup I found it quite intriguing. Then I started to dig into works that cited this and unexpectedly papers from top conferences such as NeurIPS cited the aforementioned document as a reference to the activation function. Here are some examples: https://proceedings.neurips.cc/paper_files/paper/2022/file/fbb10d319d44f8c3b4720873e4177c65-Paper-Conference.pdf https://proceedings.neurips.cc/paper_files/paper/2022/file/69e2f49ab0837b71b0e0cb7c555990f8-Paper-Conference.pdf The researchers who have done that are not referencing the original ReLU paper instead which I think is a bit disrespectful towards the achievement of original authors. On the other hand, maybe I am overthinking it a bit. ReLU has been around for a while and it would be surprising for someone conducting research in deep learning to not knowing it hence as a reader I wouldn't necessarily mind if people did not include the reference to the paper which is widely known. However, I reckon if a reference is made, then it should be meaningful and correct, and not just another extra few lines in a bibliography making it look big. submitted by /u/dj_giga_chinol [link] [comments]  ( 9 min )
    [D] best local LLM for answering to custom document
    Hi guys, I'm developing a local tool able to reply question related to one or more document. I found a good solution in using sentence embedding followed by similarity search to include only the most significative part of the document in the prompt. In this contest I search for the lightest LLM able to reply to this question. For example, LLM based on Bert are generally smaller but are they good enough? I'm not an expert in this field, I hope I give you meaningful information. Thanks! 🙏 submitted by /u/Tough-Assistant-9740 [link] [comments]  ( 9 min )
    [D][R] New to ML Research, how often are you disheartened when something you have been working on for months does not work out ? and how do you deal with it ?
    I am new to research in ML, at present a grad student and began working in a lab on my own work. My advisor is very understanding, supportive and took a leap of faith to fund me, since I did not have prior experience in research. I have been working on a problem for 4 months now and have been getting poor results for the past week. All the literature surveys, digressions within the problem statements and running the experiments to end up with not-so-good results is extremely disheartening. ​ I am still continuing to run additional experiments, figuring out where things can be going wrong and trying to conduct further analysis, but I feel like I have let down my advisor. I still have the entire semester to work on it and possibly other stuff, I am motivated for it, but at times ponder over the huge chunk of time I have spent on the current work. ​ How do you deal with such results and hitting the wall in your research ? Does it happen often ? What would you advice I do to continue working ? submitted by /u/V1bicycle [link] [comments]  ( 9 min )
    Which text to speech is this? [D]
    https://youtube.com/shorts/mRZMOFqD0F0?si=jyHQVwq2ouAKP1t9 submitted by /u/AdGeneral5378 [link] [comments]  ( 9 min )
    [D] suggestion for AI tools (chat style) that run on-prem with vectorDB?
    Hi, I'm looking to run an on-prem ChatGPT style LLM solution that can ingest private customer data into a VectorDB. So far I have tried three... GPT4All - limited to only allows for up to 13b parameter LLMs and only on CPUs (currently), also its 'localdocs' implementation I've found to only reference its docs very infrequently when answering. H2OGPT - it's implementation of localdocs (I believe via LangChain) seems pretty good. but seems like every time I run an instance, I would have to re-vector my documents. Not sure if there is a way to attach an VectorDB to it so it's ready to go right away. PrivateGPT - seems to work very well, currently it's only running on CPUs thus response time is over a minute. Curious if the community knows of any other products that do this and are already GPU accelerated. ​ TY in advance. ​ ​ submitted by /u/konrad21 [link] [comments]  ( 9 min )
    [D] How to improve my Support Vector Machine (SVM) Paper?
    Hi guys, Seeking some advice from some experienced researchers in support vector machines and kernel methods. I made this paper that breaks down using multi-class SVM in a One Against All approach, how to solve them with Lagrange multipliers https://github.com/jacobmcasey/MultiClass-SVM-Lagrange-Hyperplane-Construction-Paper As it currently stands it’s more a nice educational resource on the topic, rather than a novel contribution. Any ideas how to extend this work into something a bit more impactful? Thanks submitted by /u/Ok_Reality2341 [link] [comments]  ( 9 min )
    [P] Efficient way to implement sparse cross-attention
    I have key-value pairs with an extensive sequence length, alongside a sparse attention mask that is data-dependent, with fewer than 5% of its elements being non-zero. I found out that Xformer has implememation for sparse self-attention (link) but not sure whether the same would work for cross-attention. Also Xformer supports only (fixed) 2D attention mask but in my case the mask is arbitary and is different for different input. Can you suggest an efficient implementation for my scenario? submitted by /u/ankanbhunia [link] [comments]  ( 9 min )
    [D]Why are special tokens not allowed in the prompt for llama-2?
    I was going through the code for Llama-2 text generation on the official github where I stumbled across this code in the generation.py file: B_INST, E_INST = "[INST]", "[/INST]" SPECIAL_TAGS = [B_INST, E_INST, ">", ">"] UNSAFE_ERROR = "Error: special tags are not allowed as part of the prompt." ... ... ... unsafe_requests = [] unsafe_requests.append(any([tag in msg["content"] for tag in SPECIAL_TAGS for msg in dialog])) ... ... ... return [ { "generation": { "role": "assistant", "content": self.tokenizer.decode(t) if not unsafe else UNSAFE_ERROR, } } for t, unsafe in zip(generation_tokens, unsafe_requests) ] Is there a reason why we can't have these tokens in the prompt? I am planning to bypass the role based dictionary entries for the prompt and instead building my own prompt generator that'll take the the system prompts and the user prompts and generate a single string to then send to the LLM. Depending on the the user's choice I want the LLM to generate concise or detailed answers(also impose a word limit in the prompt itself), so I am planning to have this as a dropdown a user can choose. based on the system option chosen(concise/detailed answer), I then want to call my prompt generator which will add the instruction tags around the "system" and "user" prompts to generate 1 string I can then pass to the LLM. I wanted to know if there was any reason these tags aren't allowed to be in the prompt. Is it only to avoid "confusion" on the different roles and following a conventional way to pass the prompts? If not, and there's a reason those tags aren't supposed to be passed inside the prompts, please do let me know,, because inside the same file the chat_completion() function is doing exactly that; adding the > and > around the system prompts and prepending it to the user prompt. submitted by /u/comical_cow [link] [comments]  ( 10 min )
    "[P]" Machine Unlearning: A Novel Framework to Unlearning, Privacy and Defending Against Inference Attacks
    Hey everyone, ​ I am excited to present my latest venture, an initiative aimed at exploring the still-murky waters of Machine Unlearning. While this new project shares its roots with our previous endeavors in biomimetic machine learning, it diverges to concentrate on the fascinating and complex issue of algorithmic forgetfulness. ​ 🎯 **Objective** ​ The cornerstone of this project is not just to create algorithms that can forget, but to do so in a way that's both efficient and secure. Our vision transcends mere algorithmic performance, embracing a multi-faceted approach that also covers privacy protections and robust defenses against model inference attacks. The ambition here is to fortify machine unlearning with a well-rounded, secure architecture, allowing it to handle real-world …  ( 10 min )
    [P] Modular Diffusion: A Python Library for Designing and Training Diffusion Models with PyTorch
    Hello everyone! I've been working on this project for a few months as part of my thesis in Machine Learning. It's a library that provides an easy-to-use yet flexible API to design and train Diffusion Models. I decided to make it because I wanted to quickly prototype a Diffusion Model but there were no good tools to do it with. I think it really can help people prototype their own Diffusion Models a lot faster and only in a few lines of code. The idea is to have a model class that takes different modules corresponding to the different aspects of the Diffusion Model process (noise schedule, noise type, denoising network, loss function, guidance, etc.) and allow the user to mix and match different modules to achieve different results. The library ships with a bunch of prebuilt modules and the plan is to add many more. I also made it super easy to implement your own modules, you just need to extend from one of the base classes available. Below is an example of the type of interface you can expect. I'd really appreciate your feedback! Check out the project here: https://github.com/cabralpinto/modular-diffusion https://preview.redd.it/0itvswxkknlb1.png?width=2528&format=png&auto=webp&s=24ce67955eadb5cf109d19716f4e5a9471b1572d submitted by /u/secularchapel [link] [comments]  ( 9 min )
    [D] What does "the actual number of English-language words" mean?
    D3PM paper https://arxiv.org/pdf/2107.03006.pdf reports perplexity on LM1B dataset. In Appendix B.2 thay authors say: Perplexities are reported relative to the actual number of English-language words in the test set (including an EOS token predicted by the model) How did they compute this number? Did they split sentences by space? Are punctuation symbols considered "English words"? Are chinese characters (which are present in the data) withous spaces counted as one word? Or is it some common knowledge that "LM1B test set contains X words"? The official implementation https://github.com/google-research/google-research/tree/master/d3pm/text is extremely difficult to comprehend. I spent several hours reading throug the code and I still have no idea how they computed the number of words. submitted by /u/Tomarchelone [link] [comments]  ( 9 min )
    [P] Significant improvements for multi-agent reinforcement learning!
    We've just released a new version of our evolutionary hyperparameter optimization RL framework, which is 10x faster than SOTA! This update is focused on multi-agent RL. We've introduced MADDPG and MATD3 to the framework. These algorithms are traditionally super brittle, and RLlib even recommends not to use their own implementation of it. However, our evolutionary framework has solved this problem! You can now train multiple agents in co-operative or competitive Petting Zoo-style (parallel API) environments, with significantly faster training and up to 4x improvement in total return when benchmarked against alternatives. Please check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 9 min )
    [P] Interactively explore unstructured datasets from your dataframe (OSS project)
    Hey r/MachineLearning, data inspection and interactive exploration is one of the most important tasks for data teams. This is especially true when dealing with unstructured data that requires a deep domain expertise (e.g. healthcare or engineering). We have tried many different options for visualizing unstructured datasets in the past: Notebooks, dash apps, custom react apps, HTML reports... However, these options were either very time-consuming to develop/maintain or not interactive enough or both. That is why we developed Spotlight: https://github.com/Renumics/spotlight https://i.redd.it/lxjnlkcmumlb1.gif Spotlight supports most unstructured data types including images, audio, text, videos, time-series and geometric data. You can find more info and use case examples for ML and engineering workflows in the repo. Happy to hear your honest feedback! ​ ​ submitted by /u/44sps [link] [comments]  ( 9 min )
    [D] How many target variable classes does sentiment analysis models BERT and RoBERTa have?
    Hi everyone, so I am a little confused on how many target variable classes does the BERT and RoBERTa models have? So I understand these 2 models are pre-trained models, which means the number of target variable classes are fixed (if I am not wrong!). For example, the link below for the RoBERTa model in Hugging Face has fixed 3 target variable classes (Negative, Neutral and Positive): https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest But when I googled around and also asked ChatGPT and Bard, they tell me these models can have as many target variable classes as the user wants (or rather this depends on how many target variable classes there are in the training dataset). If these are pre-trained models already (which already have the number of target variable classes pre-determined in the model already), then how come some of the google sites and ChatGPT and Bard is telling me the user can choose however many target variable classes that they want? ​ submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [P] Vehicles moving in wrong direction.
    I am working on a professional project which involves detecting a vehicle moving in a wrong direction on the street. Data details : I have consecutive frames of the street in which vehicles are moving. So far : I have created a model that detects the objects inside the frame and give me the coordinates(bounding-box) of those objects(vehicles). And I using the optical flow to produce the optical lines on the objects which are moving inside the consecutive frames and I am also able to get the direction(if a object is moving from top to bottom on the frame it means it is in right direction and if bottom to top in the frame it means wrong direction based on change in y-offset of the object). Now the optical lines code is different which is giving me the direction of the object(I am not using any model to detect object in this code it's based on Lucas-Kadane method) and when I say direction I mean I'm using the cv2.imshow() which actually plays the consecutive frames together and draw optical lines on it and shows me the direction visually. Now the problem is I want the coordinates of the object that is moving in wrong direction (the bounding box coordinates) how can I achieve that? Any suggestions and ideas would be helpful please mind I can't use any other technology or model as this are project requirements in professional setting) submitted by /u/Sherlock_holmes0007 [link] [comments]  ( 9 min )
    [D] How to decide number of trees in hyperparameter tuning?
    The dataset I have consists of around 2,300 observations and 120 variables, of which around 25 are highly correlated, so I narrowed it down to 95 variables. I'm using R's boost_tree() with xgboost as my model. How do I decide when to stop tuning for number of trees, mtry, min_n, and tree depth, without actually overfitting the data? Because as I increase the number of trees (or any other variable like the ones above), my RMSE obviously goes down, but how do I know it is overfitting the training data? Or is there no overfitting in this case, since I am using cross validation (15 fold) already? PS, the test data is 800 observations submitted by /u/heeeehuuuu [link] [comments]  ( 9 min )
    [D] Feature extraction in multivariate time series
    How do you usually do feature extraction for time series data? I used to work on visual domain, so I'm pretty familiar with CVs but recently I was assigned some tasks on multivariate time series data and it's been quite difficult getting used to. Major problem is, while features in vision have semantic meaning not just along temporal axis but also "spatially," the multivariate time series does not. Also, is it considered a "cheating" if i pre-extract certain features that are already established by experts to have high correlation with the result, rather than letting the machine learning algorithm learn on its own those "certain features" in some way through training? Thanks! submitted by /u/-273deg [link] [comments]  ( 9 min )
    [D] MLOps resources
    Hi, I wonder which books or courses you would recommend for intermediate and advanced MLOps/ML design systems. What I mean is topics like handling hundreds of models and their updates, reusable CI/CD pipelines, batch and online architectures, integration with feature stores, jobs/queues for model scheduling, data drift, metrics monitoring, and alerts, and so on. This would be for someone familiar with the major concepts, hands-on experience with MLflow, SageMaker, Azure ML services, Databricks and similar tools. submitted by /u/rodrigo-arenas [link] [comments]  ( 9 min )
    [D] RLHF for multi-turn conversation, Option A or B?
    I have a dataset that consists of dialogues between a user and a chatbot (ChatGPT), and I want to use this data to implement Reinforcement Learning from Human Feedback (RLHF). I have already completed Supervised Fine-Tuning (SFT) and built the reward model. Now, I need some guidance on how to handle the data. Here is an example of the pre-collected data: >> User: Give me a tip on how to succeed in drawing. >>> ChatGPT: Practice regularly and be patient with yourself. Improvement takes time. >>>User: But drawing is hard. >>>ChatGPT: It is, and that's okay. It's normal to find it challenging, especially when you're just starting out. Just try to enjoy the process and don't be too hard on yourself. ​ ==========My input Data 1 is :>> User: Give me a tip on how to succeed in drawing. Suppose my model outputs the following for Input Data 1: ChatGPT: Practice makes perfect. My question is, for Input Data 2, should I use: Option A: User: Give me a tip on how to succeed in drawing. ChatGPT: Practice makes perfect. User: But drawing is hard. In this option, I use the actual previous term's agent output and append the pre-collected user data. Or Option B: User: Give me a tip on how to succeed in drawing. ChatGPT: Practice regularly and be patient with yourself. Improvement takes time. User: But drawing is hard. In this option, I use all the pre-collected data, which might not even be the current model's output. Which option is more appropriate for RLHF, A or B? submitted by /u/No_Oilve_6577 [link] [comments]  ( 9 min )
    [D] Why did the authors design this gradient reversal layer in the paper "Unsupervised Domain Adaptation by Backpropagation"?
    I am reading the famous paper " Unsupervised Domain Adaptation by Backpropagation" again, but still got confused why the authors had to design this gradient reversal layer. To my understanding, simply adding a minus-one (-1) in front of the domain classifier head is good enough. Of course, we need to minimize the original domain classifier head at some point to make it decent. For example, if it is a two-step training like GAN, we can (1) Freeze other parts but only minimize the domain classification loss to update the domain classifier head; and then (2) Freeze the domain classification head, but maximize the domain classificatoin loss to update the feature extractor. We can alternate between (1) and (2). Is the main motivation of gradient reversal layer that we can merge (1) and (2) into a single training step? submitted by /u/AaronSpalding [link] [comments]  ( 9 min )
    [D] Dataset condensation
    Hello everyone, has anyone here read the paper "Dataset Condensation with Gradient Matching"? I've been reading it, but I got stuck trying to understand how they transition from the point where the loss is the distance between parameters to the point where the loss is the distance between gradients. Could someone please explain this process in detail? Apparently, they make the assumption that the initializations are the same and that the distance between parameters is close to zero for every iteration, but I'm still struggling to comprehend how they arrive at the conclusion that the distance is now between gradients. submitted by /u/Ok-Cartographer-1363 [link] [comments]  ( 9 min )
    "[P]" A Scientific Exploration into the Integration of Biomimicry Principles within Machine Learning Algorithms
    Hey everyone, I am excited to introduce a project that delves into the experimental fusion of Biomimicry principles with Machine Learning algorithms. While the concept of unlearning serves as our initial prototype, the overarching ambition extends far beyond, aiming to pioneer new methodologies inspired by natural phenomena. 🎯 Objective The core objective of this research is to investigate the feasibility and efficacy of incorporating biomimetic principles into machine learning algorithms. The goal is not merely to improve algorithmic performance but also to introduce novel methods that can tackle complex computational problems, much like how nature solves intricate issues in an energy-efficient manner. --- 📑 Methodological Outline **Conceptual Framework**: The project adopts a…  ( 10 min )
  • Open

    Andrew Ng doesn't think RL will grow in the next 3 years
    From his latest talk on AI, he has ever field of ML growing in market size / opportunities except for RL. Do people agree with this sentiment? Unrelated, it seems like RL nowadays is borrowing SL techniques and apply to offline datasets. submitted by /u/wardellinthehouse [link] [comments]  ( 9 min )
    Achieving 4000x Speedups with PureJaxRL
    submitted by /u/shrekkertech [link] [comments]  ( 9 min )
    Does anybody know why gym environments are opening in not secure window on my browser?
    ​ https://preview.redd.it/ina42mr5wplb1.png?width=1298&format=png&auto=webp&s=f65fe5eade9dc1f3312ff280436e6e0a5ba6e380 submitted by /u/nimageran [link] [comments]  ( 9 min )
    Question about forward-view TD compares to planning in model-based RL
    I have a confusion in difference between forward-view TD sampling and model-based RL. Assuming using approximation function. In forward-view TD (more than one step), the reward sampling is the future estimation in according to the currently policy (kind like searching the best situation). What is the different between the forward-view TD which likely to be planned by the policy (assuming greedy) and the model-based RL which planned by the model of fake environment? Does the only difference is model-based able to predict the result of action in 1-2-3 step in the future (in agent's head) from the transition model where model-free rely on the approx. function? submitted by /u/AnnonymeowCat [link] [comments]  ( 9 min )
    suggestion for AI tools (chat style) that run on-prem and allow for vectorDB input?
    Hi I'm looking to run an on-prem ChatGPT style LLM that can ingest private customer data via a VectorDB. So far I have tried three... GPT4All - limited to only allows for up to 13b parameter LLMs and only on CPUs (currently), also its 'localdocs' implementation I've found to only reference its docs very infrequently when answering. H2OGPT - it's implementation of 'localdocs' (I believe via LangChain) seems pretty good. but seems like every time I run an instance, I would have to re-vector my documents. Not sure if there is a way to attach an VectorDB to it so it's ready to go right away. PrivateGPT - seems to work very well, currently it's only running on CPUs thus response time is over a minute. Curious if the community knows of any other products that do this and are already GPU accelerated. ​ TY in advance. ​ ​ submitted by /u/konrad21 [link] [comments]  ( 9 min )
    Significant improvements for multi-agent reinforcement learning!
    We've just released a new version of our evolutionary hyperparameter optimization RL framework, which is 10x faster than SOTA! This update is focused on multi-agent RL. We've introduced MADDPG and MATD3 to the framework. These algorithms are traditionally super brittle, and RLlib even recommends not to use their own implementation of it. However, our evolutionary framework has solved this problem! You can now train multiple agents in co-operative or competitive Petting Zoo-style (parallel API) environments, with significantly faster training and up to 4x improvement in total return when benchmarked against alternatives. Please check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 9 min )
    OpenLLaMA2 - A DeepSpeed/Ray based LLaMA2 RLHF framework
    https://github.com/OpenLLMAI/OpenLLaMA2 We can fully fine-tune a 7B llama2 RLHF model with a Single A100 GPU. We are also developing a Ray-based RLHF framework for > 30B models. We provide PPO hyperparameters out of the box. Welcome to participate in the development and star OpenLLaMA2. ​ Some PPO vs. SFT examples. https://github.com/OpenLLMAI/OpenLLaMA2/blob/main/docs/ppo_examples.md SFT Human: Please introduce the game GTA5 Assistant: GTA5, or Grand Theft Auto V, is an action-adventure video game developed by Rockstar North and published by Rockstar Games. It was released in 2013 for the PlayStation 3 and Xbox 360, and later in 2014 for the PlayStation 4 and Xbox One. The game is set in the fictional city of Los Santos, based on Los Angeles, and follows the story of three crimi…  ( 10 min )
    RLHF, option A or B, plz help me choose
    I have a dataset that consists of dialogues between a user and a chatbot (ChatGPT), and I want to use this data to implement Reinforcement Learning from Human Feedback (RLHF). I have already completed Supervised Fine-Tuning (SFT) and built the reward model. Now, I need some guidance on how to handle the data. Here is an example of the pre-collected data: >> User: Give me a tip on how to succeed in drawing. >>> ChatGPT: Practice regularly and be patient with yourself. Improvement takes time. >>>User: But drawing is hard. >>>ChatGPT: It is, and that's okay. It's normal to find it challenging, especially when you're just starting out. Just try to enjoy the process and don't be too hard on yourself. ​ ==========My input Data 1 is :>> User: Give me a tip on how to succeed in drawing. Suppose my model outputs the following for Input Data 1: ChatGPT: Practice makes perfect. My question is, for Input Data 2, should I use: Option A: User: Give me a tip on how to succeed in drawing. ChatGPT: Practice makes perfect. User: But drawing is hard. In this option, I use the actual previous term's agent output and append the pre-collected user data. Or Option B: User: Give me a tip on how to succeed in drawing. ChatGPT: Practice regularly and be patient with yourself. Improvement takes time. User: But drawing is hard. In this option, I use all the pre-collected data, which might not even be the current model's output. Which option is more appropriate for RLHF, A or B? submitted by /u/No_Oilve_6577 [link] [comments]  ( 10 min )
  • Open

    Does anyone know if an AI can help me?
    My friend has a picture of herself from a while ago with a fake tattoo. She had the tattoo made from an original image, but she doesn't have it anymore. Is there an AI that could take the tattoo from the picture that is on her body and make it into a 2d version that can be made into a tattoo guide? submitted by /u/StitchTheFox [link] [comments]  ( 9 min )
    AI System Can Predict Chemical Smells Based on Molecular Structures
    A new study cites the creation of an AI system that can predict how a specific compound will smell by analyzing its molecular structure. You can check it out here. If you want to stay on top of the latest trends and insights in AI, look here first. Why is this significant? The AI system, developed by researchers at startup Osmo, can utilize 55 descriptive words to assign a smell or 'aroma' to a chemical compound or 'odorant'. This breakthrough might be utilized to enhance the food and cleaning product industries where synthetic scents play an essential role. What’s next for this AI system? The AI's predictions often aligned closer with human consensus than any individual guess, indicating its robustness and potential. The next step for this research is to comprehend how different odorants mix and compete to yield a smell that the human brain identifies as unique. However, the sheer number of combinations, even with a small set of odorants, poses a daunting task. To quote Stuart Firestein, a neurobiologist at Columbia University, “Predicting what a mix smells like is the next frontier.” P.S. If you like this kind of analysis, you’ll love my free newsletter that tracks the most relevant news and research in AI and tech. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    AI — weekly megathread!
    News provided by aibrews.com Researchers introduce ‘Swift’, the first autonomous vision-based drone that beat human world champions in several fair head-to-head races. This marks the first time that an autonomous mobile robot has beaten human champions in a real physical sport [Details]. Generative AI updates from Google Cloud Next event: General availability of Duet AI in Google Workspace [Details]. SynthID - a tool for watermarking and identifying AI images generated by Imagen (Google’s text-to-image diffusion model). It embeds a digital watermark directly into the pixels of an image, making it invisible to the human eye, but detectable for identification, without reducing the image quality [Details]. AlloyDB AI for building generative AI applications with PostgreSQL [Details]. …  ( 11 min )
    AI-powered hate speech detection will moderate voice chat in Call of Duty
    submitted by /u/SAT0725 [link] [comments]  ( 9 min )
    Generative AI could potentially automate up to 75 million global jobs, ILO Study Finds
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    How to generate movies using gen AI/prompts?
    I bet there’s a genius research team out there that started work on this How cool/crazy would that be? submitted by /u/AILaunchpad [link] [comments]  ( 9 min )
    Odd Bing conversation turn
    This happened. Was NOT aware the already extensive and tiresome limitation in discussion subjects was THIS pervasive, and frankly, this fragile egoed. Really? THIS is "controversial?" submitted by /u/HotaruZoku [link] [comments]  ( 9 min )
    TinyTap rolls out new AI features for educators and parents
    submitted by /u/baillyjonthon [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/31/2023
    Forget smartwatches, Microsoft may make a backpack with an AI assistant.[1] Call of Duty will use AI to moderate voice chats.[2] OpenAI Introduces Special Tutor Prompts To Implement ChatGPT In Classrooms.[3] Google Meet’s new AI will be able to go to meetings for you.[4] Sources: [1] https://www.windowscentral.com/software-apps/forget-smartwatches-microsoft-may-make-a-backpack-with-an-ai-assistant [2] https://www.theverge.com/2023/8/30/23852652/call-of-duty-activision-modulate-toxmod-artificial-intelligence-voice-moderation [3] https://robots.net/news/openai-introduces-special-tutor-prompts-to-implement-chatgpt-in-classrooms/ [4] https://www.theverge.com/2023/8/29/23849056/google-meet-ai-duet-attend-for-me submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    We will take our symbiosis with animals to the next level.
    submitted by /u/kipaxbooks [link] [comments]  ( 9 min )
  • Open

    How to Save a Neural Network Model in Python Tensorflow?
    submitted by /u/aheadMake57 [link] [comments]  ( 9 min )
  • Open

    Elevating the generative AI experience: Introducing streaming support in Amazon SageMaker hosting
    We’re excited to announce the availability of response streaming through Amazon SageMaker real-time inference. Now you can continuously stream inference responses back to the client when using SageMaker real-time inference to help you build interactive experiences for generative AI applications such as chatbots, virtual assistants, and music generators. With this new feature, you can start streaming the responses immediately when they’re available instead of waiting for the entire response to be generated. This lowers the time-to-first-byte for your generative AI applications. In this post, we’ll show how to build a streaming web application using SageMaker real-time endpoints with the new response streaming feature for an interactive chat use case. We use Streamlit for the sample demo application UI.  ( 12 min )
    FMOps/LLMOps: Operationalize generative AI and differences with MLOps
    Nowadays, the majority of our customers is excited about large language models (LLMs) and thinking how generative AI could transform their business. However, bringing such solutions and models to the business-as-usual operations is not an easy task. In this post, we discuss how to operationalize generative AI applications using MLOps principles leading to foundation model operations (FMOps). Furthermore, we deep dive on the most common generative AI use case of text-to-text applications and LLM operations (LLMOps), a subset of FMOps. The following figure illustrates the topics we discuss.  ( 23 min )
  • Open

    Fast-tracking fusion energy’s arrival with AI and accessibility
    MIT Plasma Science and Fusion Center will receive DoE support to improve access to fusion data and increase workforce diversity.  ( 8 min )

  • Open

    Thoughts on ZTM? [D]
    Thoughts on the Zero to Mastery programs? There is a machine learning bootcamp course on Udemy that is part of that program. I feel like i've heard negative reviews about them in the past, but it's only 12.99 right now and I feel like it covers a lot of content. So I guess I'm just wondering if it's really that bad, or if the course would be worth my time? Would it really take me from "Zero to Mastery"? Thanks submitted by /u/Mountain-Economy1476 [link] [comments]  ( 9 min )
    Math for ML Course on Udemy [D]
    Are there any good math for machine learning courses on Udemy? I specifically want a course that offers lots of exercises so I am able to practice what I learn. Thanks submitted by /u/Mountain-Economy1476 [link] [comments]  ( 9 min )
    "[D]" A Scientific Exploration into the Integration of Biomimicry Principles within Machine Learning Algorithms
    Hey everyone, I am excited to introduce a project that delves into the experimental fusion of Biomimicry principles with Machine Learning algorithms. While the concept of unlearning serves as our initial prototype, the overarching ambition extends far beyond, aiming to pioneer new methodologies inspired by natural phenomena. --- 🎯 **Objective** The core objective of this research is to investigate the feasibility and efficacy of incorporating biomimetic principles into machine learning algorithms. The goal is not merely to improve algorithmic performance but also to introduce novel methods that can tackle complex computational problems, much like how nature solves intricate issues in an energy-efficient manner. --- 📑 **Methodological Outline** **Conceptual Framework**: The proje…  ( 10 min )
    [P] We embedded all SEC and Press Releases data for US companies, it is available for retrieval
    Retrieval augmented generation (RAG) is one of the most popular way to add additional knowledge to your LLMs. To do RAG well, you need to do three things well - Curate high quality datasets Create abstractions (embeddings, keyword indexes, knowledge graphs) Stitch everything together for better retrieval We have realized that it is even harder than what it looks like. We want to easily enable this infra for a range of datasets, starting with company-specific data. You can give it a go here on our playground or get started with our open sourced library submitted by /u/achyutjoshi [link] [comments]  ( 9 min )
    [D] Anyone submitted to CPAL?
    There was a paper submission deadline for Conference on Parsimony and Learning (CPAL) earlier this week. This is their first conference so I expect the number of submissions to be very small, but has anyone submitted? I am guessing they received like 100 or 200ish submissions. submitted by /u/neurogramer [link] [comments]  ( 9 min )
    [D] Best frameworks and tools to design ml based web applications
    As the title says, I'm looking for a list of the best tools and framework to learn, useful for build machine learning solution as web application. I want to move my projects from being jupyter notebooks using tensorflow or pytorch, to ml API and applications. submitted by /u/AcquaFisc [link] [comments]  ( 9 min )
    [D] How do you track what you learnt from the papers?
    It has always been a struggle for me. I tried to take notes as I read paper, but that’s not quite sustainable because it’s difficult to track where did the notes come from for more details. Or I highlight the sections with added comments but that’s also not quite accessible when you have tones of pdf lying around somewhere or worse print outs. Recently I’ve been trying a cloud based pdf reader that stores my papers and allow searches over all highlights and comments (Pond) Thinking if I could also use it to share papers with my colleagues but I’m not sure if it will work because that will require them to use it as well. How do you solve this ? submitted by /u/dockerun [link] [comments]  ( 9 min )
    [D] need dataset for my research project
    I am working on a project for my research but need a dataset which contains the generation and consumption of electricity for Micro Hydro Power station, anyone could help me. I will be grateful submitted by /u/Due-Draft6855 [link] [comments]  ( 9 min )
    [N] Supporting the Open Source AI Community
    https://a16z.com/2023/08/30/supporting-the-open-source-ai-community/ From the text: We believe artificial intelligence has the power to save the world—and that a thriving open source ecosystem is essential to building this future. Thankfully, the open source ecosystem is starting to develop, and we are now seeing open source models that rival closed-source alternatives. Hundreds of small teams and individuals are also working to make these models more useful, accessible, and performant. These projects push the state of the art in open source AI and help provide a more robust and comprehensive understanding of the technology. They include: instruction-tuning base LLMs; removing censorship from LLM outputs; optimizing models for low-powered machines; building novel tooling for model inference; researching LLM security issues; and many others. However, the people behind these projects often don’t have the resources available to pursue their work to conclusion or maintain it in the long run. The situation is more acute in AI than traditional infrastructure, since even fine-tuning models requires significant GPU computing resources, especially as open source models get larger. ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models - University of Illinois 2023
    Paper: https://arxiv.org/abs/2308.16137 Abstract: In recent years, there have been remarkable advancements in the performance of Transformer-based Large Language Models (LLMs) across various domains. As these LLMs are deployed for increasingly complex tasks, they often face the needs to conduct longer reasoning processes or understanding larger contexts. In these situations, the length generalization failure of LLMs on long sequences become more prominent. Most pre-training schemes truncate training sequences to a fixed length (such as 2048 for LLaMa). LLMs often struggle to generate fluent texts, let alone carry out downstream tasks, after longer contexts, even with relative positional encoding which is designed to cope with this problem. Common solutions such as finetuning on longer …  ( 9 min )
    [R] CoTracker: A Revolutionary 2D Point Video Tracker
    CoTracker - a 2D point-tracking tool for videos - promises to revolutionize motion tracking. Through the use of a transformer network, it meticulously predicts point trajectories and visibility across video frames, giving insights like never before. https://i.redd.it/g0u5t9n1ehlb1.gif Here's why CoTracker is turning heads: CoTracker leverages advanced transformer formulation: Utilising a grid of input tokens that evolve to output tokens, CoTracker allocates initial values derived from the track's start point and time. It's built to handle extended videos through 'windowed inference': Windowing enables the algorithm to handle videos beyond its maximum window length by splitting them into overlapping segments. 'Unrolled Learning' caters to semi-overlapping windows effectively: By employing two unique types of losses, only a modest amount of windows are used in loss computation while still handling expansive videos at test time. Improved tracking through simultaneous multi-point selection: By tracking multiple points at once, CoTracker is able to better establish correlation and motion paths within videos. Despite its notable strengths, there are limitations. Its sliding-window approach cannot handle long-term occlusions that last longer than a window, and its transformer-based model has a high computational cost that grows quadratically with the number of tracked points. According to the authors, “The result is a flexible and powerful tracking algorithm that outperforms state-of-the-art methods in almost all benchmarks”. But it’s yet to be seen how it will perform in real-life tasks. What do you think? P.S. If you like this type of analysis, you might want to check this out. (arXiv) (GitHub) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [Project] Combining Prompt Engineering with Structured Inputs to LLMs to Generate Insights on Predictions from Binary Classifiers
    All of this is framed through the lens of improving your ability to understand if a prediction for an upcoming UFC match is good or not. Happy to dig further into the ML and processing around this. https://blog.wolftickets.ai/teaching-a-wolf-to-speak-transforming-fight-predictions-into-insights.html Any feedback is appreciated! submitted by /u/wolfticketsai [link] [comments]  ( 9 min )
    [P] Autolabel: data labeling with LLMs
    Hi everyone, Wanted to share an open source project we've been working on for the last few weeks: Autolabel is an open source Python library to label and enrich text datasets with LLMs (Large Language Models). Why? Access to clean, labeled data is a huge bottleneck for most ML/data science teams. From experiments across a variety of NLP tasks and datasets, we have found that the most capable LLMs are able to label data at better quality than human annotators, but 20-100x faster. Getting Started¶ You can get started with the library by defining a JSON config, and writing a few lines of code: from autolabel import LabelingAgent, AutolabelDataset agent = LabelingAgent('config.json') dataset = AutolabelDataset('dataset.csv', 'config.json') labels = agent.run(dataset) Installation guide Sample notebooks that show how to use the library for different labeling tasks. Technical report for benchmarking LLM and human annotator performance across a range of tasks and datasets. Call for Feedback We just open sourced this library, and are actively developing it. Feedback is very welcome and so are requests for features. You can open an issue on Github for bugs and request features submitted by /u/nihit-d [link] [comments]  ( 9 min )
    [D] Notation problem of equation 1 from the paper Axiomatic attribution for deep networks?
    In Equation 1 of the paper "Axiomatic Attribution for Deep Networks", the denominator of the gradient is $\partial x_i$ (See Eq1). However, according to the paper(with Eq2), shouldn't it be $\partial (x'_i + \alpha \times (x_i-x'_i))$ rather than $\partial x_i$? I found many following papers which refer to this paper also use the notation like this. Do I misunderstand something? submitted by /u/qjall [link] [comments]  ( 9 min )
    [Discussion] Leveraging Leaky Softplus Activation with Momentum-Based Optimizers like Adam for Efficient Neural Network Training
    In the realm of deep learning, the choice of activation functions and optimization algorithms can significantly impact the training process and the performance of neural networks. A relatively lesser-known gem in this landscape is the "leaky softplus" activation function, which, when paired with momentum-based optimizers like Adam, can lead to exceptionally efficient and effective training outcomes. The Leaky Softplus Activation Function The leaky softplus activation function combines the benefits of both linearity and non-linearity in a graceful manner. Defined as Math.Log(Math.Exp(x) + 1) + (x / 16), it smoothly transitions between a nearly linear response for negative inputs and a more pronounced non-linear response for positive inputs. This unique characteristic enables it to address…  ( 10 min )
    [P] DeepEval - Neural Framework For Testing LLMs
    Hi everyone, I built DeepEval - an open-source unit testing framework for LLMs in order to accelerate development and iteration. The problem When designing software applications, testing has always been critical for a lot of production applications. However - with the rise of LLM applications, the type of testing required needs to change in order to adapt for the large number of possible queries. We therefore built DeepEval in order to make it easy to write LLM tests in just 1 line of code. We hope this solution is of value to future teams when iterating on their RAG pipelines, migrating LLM models, testing their fine-tuned LLMs. The solution The DeepEval framework is as follows: We split up testing LLMs into 4 main sections: - Answer Relevancy (how relevant an answer is to a question) - measured using a question-answer bi-encoder. - Factual consistency (whether the generated answer is hallucinating) - measured using entailment from an NLI model - Conceptual similarity (when given a ground truth, how closely does it relate to it - for example How big is it? The size of an orange vs 20 square centimetres.) - measured using vector similarity - Bias, Toxic classification (measured through DL classifier models) I would love any feedback on what we are building here and welcome any OS contributions! submitted by /u/ConfectionSafe954 [link] [comments]  ( 9 min )
    [D] Training models when you have limited compute power
    I've been wanting to take a code chatbot model like starchat or codellama and tune it to our codebase, problem is all I have at work is a Mac with 8gb of RAM. I talk with my boss today and can ask for some stuff if I want and can give good reason. What's the most efficient way to get the compute I need to train the model. Any other advice on how to go about doing this is greatly appreciated submitted by /u/Kechup17 [link] [comments]  ( 9 min )
    [N] DINOv2 is now available under the Apache 2.0 license
    Meta AI has made their DINOv2 self-supervised learning method for training computer vision models truly open source by publishing it under Apache 2.0 license. DINOv2 has outperformed previous state-of-the-art self-supervised learning methods on a variety of computer vision tasks, including image classification, object detection, and semantic segmentation. It is also more efficient to train than previous methods, making it more accessible to researchers and practitioners. DINOv2 is different from existing methods because it provides a new way to train high-performance computer vision models without the need for labeled data. This makes it possible to train models on large datasets of unlabeled images, which can be more cost-effective and time-efficient than collecting and labeling large datasets of images. New demo: https://dinov2.metademolab.com/ submitted by /u/noiseinvacuum [link] [comments]  ( 9 min )
    [P] Deep reinforcement learning library to import multiple URDF robots and objects ?
    I have experience in deep learning but am a beginner in using deep reinforcement learning for robotics. However, I have recently gone through the huggingface course on deep reinforcement learning. I tried tinkering around with panda-gym but am having trouble trying to start my own project. I am trying to use two UR5 robots do some bimanual manipulation tasks e.g. have the left arm hold onto a cup while the right pours water into it. panda-gym allows me to import a URDF file of my own robot but I can't find the option to import my own objects like the xml file (or any extension) of a table or a water bottle. I have no idea which library allows me to import multiple URDF robots and xml objects and was hoping for some help. EDIT : I actually just read about Gazebo and was wondering if it'll allow me to do the above ? As a beginner I still have zero experience with ros and gazebo. submitted by /u/I_am_a_robot_ [link] [comments]  ( 9 min )
    [D] Optimizing simple distributions for something other than maximum likelihood
    As everyone knows, we usually optimise for maximum likelihood when fitting distributions like gaussians (equivalent to the forward KL-divergence). But for neural networks, techniques like GANs allow the minimisation of other distances like Mutual Information or Reverse KL. While this is certainly a very cool and insightful approach, it's also highly complex. I wonder wether other approaches to this problem exist for the simpler case, like fitting a gaussian or some other analytic distribution. From statistics, I have only encountered maximum likelihood and it's variations like robust statistics. submitted by /u/LeanderKu [link] [comments]  ( 9 min )
    [P] I created a package implementing a SOTA technique for XAI ( Explainable AI)
    This is the package https://github.com/mfumagalli68/xi-method Follow the README and install directly from pypi. From the paper: " [..]To bridge this gap we propose a family of measures of statistical association whose definition is well-posed also for nonordered data. Our intuition is to rely on separation measurements between probability mass functions. Here, by separation measurement we mean any distance or divergence between probability mass functions that is positive, and that is null if and only if the probability mass functions coincide. Then, we show that the new class of sensitivity indices complies with Renyi’s postulate D of measures of statistical dependence (Renyi, 1959). This postulate, called zero-independence property in the following, requires that a measure of associat…  ( 10 min )
  • Open

    "What Are Dreams For?" (twitching in fetal dreaming suggests dreams are offline RL for learning motor control)
    submitted by /u/gwern [link] [comments]  ( 9 min )
    DQN can't solve frozen lake environment
    Hello all, I am trying to solve the frozen lake environment using DQN. And I see two issues. One is that the loss falls down to zeros and second the agent only reaches the goal only 5 times in 1000 epochs. Here's my code. import numpy as np import tensorflow as tf from tensorflow.keras import layers, activations import matplotlib.pyplot as plt import gym def create_agent(num_inputs, num_outputs, layer1, layer2): inputs = layers.Input(shape=(num_inputs, )) hidden1 = layers.Dense(layer1)(inputs) activation1 = activations.relu(hidden1) hidden2 = layers.Dense(layer2)(activation1) activation2 = activations.relu(hidden2) outputs = layers.Dense(num_outputs, activation='linear')(activation2) model = tf.keras.Model(inputs, outputs) return model loss_mse = tf.keras.losses.MeanSquaredError() lear…  ( 10 min )
    "Echo Chess: The Quest for Solvability" (level design preference learning: predicting high-quality soluble mazes using human feedback from quitting rates)
    submitted by /u/gwern [link] [comments]  ( 9 min )
    [P] Library to import multiple URDF robots and objects ?
    I have experience in deep learning but am a beginner in using deep reinforcement learning for robotics. However, I have recently gone through the huggingface course on deep reinforcement learning. I tried tinkering around with panda-gym but am having trouble trying to start my own project. I am trying to use two UR5 robots do some bimanual manipulation tasks e.g. have the left arm hold onto a cup while the right pours water into it. panda-gym allows me to import a URDF file of my own robot but I can't find the option to import my own objects like the xml file (or any extension) of a table or a water bottle. I have no idea which library allows me to import multiple URDF robots and xml objects and was hoping for some help. submitted by /u/I_am_a_robot_ [link] [comments]  ( 9 min )
    Mini-Batch in PPO
    Hi I am struggling to understand the mini-batch in PPO. Say I already collected two trajectories. Traj_A = [t = 1, t= 2, t=3 ,.... t = 100] Traj_B = [t =1 , t=2, ... t= 78] Now, I heard you usually break this down onto mini-batch (say a batchsize of 6). Do you do random sampling? eg, one batch is [Traj_A_t=1, Traj_A_t=2, Traj_A_t=100, Traj_A_t=66, Traj_A_t=77, Traj_A_t=55]??? OR do you need to maintain some sequence [Traj_A_t=1, Traj_A_t=2, Traj_A_t=3, Traj_A_t=4, Traj_A_t=5, Traj_A_t=6]??? submitted by /u/No_Oilve_6577 [link] [comments]  ( 9 min )
  • Open

    every time i talk to llama 2 it sounds like its scared of getting punished
    submitted by /u/nicdunz [link] [comments]  ( 9 min )
    Breaking: US expands export restrictions on Nvidia AI chips to Middle East
    The US government has imposed expanded export restrictions affecting Nvidia’s leading artificial intelligence chips, curbing their exportation beyond China to certain Middle Eastern countries. If you want to stay on top of AI advances, look here first. https://preview.redd.it/xe7ho00t0ilb1.png?width=1240&format=png&auto=webp&s=61225931bf3e316efd90eab83846402d4148aca2 Why this matters: Nvidia’s A100 and H100 chips are affected: These AI chips are important and used to accelerate machine-learning tasks on major AI applications like ChatGPT. Despite the restrictions, Nvidia maintains they won’t have an “immediate material impact” on its results. Other companies, like AMD, are also affected: They’ve reportedly received similar restrictions notice, hinting at a broader move by the US government to control the distribution of AI chip technology. The move is part of a larger geopolitical play: These restrictions form part of the Biden administration’s efforts to curtail Beijing’s ability to capitalize on the AI revolution. How Nvidia and the industry might respond: Nvidia CEO Jensen Huang has cautioned the US: In a Financial Times interview, Huang warned that imposing such restrictions could lead to “enormous damage” to the US tech industry, predicting China may become self-sufficient in AI chip development. Yet, Nvidia still managed impressive earnings recently: Despite these challenges, Nvidia recently reported quarterly revenue of $13.5bn, exceeding predictions by $2bn. Further restrictions could significantly alter the landscape for AI development, potentially fostering greater innovation in countries affected or even a race to develop independent solutions. P.S. If you like this kind of analysis, you might want to check this out. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Baidu publicly releases their AI chatbot Ernie Bot
    In a bid to rival the United States’ stronghold in the AI industry, Chinese search engine and AI firm Baidu, has made its ChatGPT-equivalent language model, Ernie Bot, fully available to the public. This marks a significant move on the AI chessboard. If you want to stay on top of everything AI, look here first. https://preview.redd.it/g68sr07iihlb1.jpg?width=1024&format=pjpg&auto=webp&s=c0c873badd448257bcc2fb125188acc198e504d6 Why does this matter? Baidu's public release of Ernie Bot signals the company's aggressive push in the generative AI market. By opening up its model to the public, Baidu can leverage expansive real-world human feedback to improve Ernie Bot. China's determination to lead the AI industry is unabated, with many tech firms launching their own generative models in response to OpenAI's popular ChatGPT. Baidu's move further fuels this rivalry. Regulation in China seems to support such AI advancements. CEO Robin Li voiced his optimism about the AI regulations—calling them "more pro-innovation than regulation". What's the broader response? Baidu's latest stride has boosted its stock price by over 3%, underlining the market's high anticipation of Baidu's AI efforts. Ernie Bot has rocketed to the top of Apple's iOS free app chart in China. This demonstrates a positive initial response from the public. Regulation is key in China's AI game: China has stringent regulations for the generative AI industry, requiring a security review and government approvals before any product launch. Moreover, companies need to comply with governmental tech and data requests. The US, on the other hand, doesn't currently have such regulations in place. A markedly different approach that could significantly influence the development and application of AI technologies. If you like this kind of analysis, you might want to check this out. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    AI Seach
    I'm looking for a AI Search tool that replaces the search bar on a website. Search tool will scrape that sites data and offer suggestions. Any recommendations? submitted by /u/CauliflowerTiny1454 [link] [comments]  ( 9 min )
    SmartGPT: Major Benchmark Broken - 89.0% on MMLU + Exam's Many Errors
    submitted by /u/Sonic_Improv [link] [comments]  ( 9 min )
    Help Me Understand ChatGPT
    I'm currently researching how users interact with ChatGPT and its features, and I'd really appreciate your insights, experience, and perspective. Why should you participate? It's a quick 5-minute survey. Your identity and responses are completely anonymous. Your input will significantly contribute to important research on ChatGPT. The final research document will be posted to this sub. Survey Link: https://forms.gle/tNBib2dA1ErFEwbk6 Rest assured, all information will be confidential and only used for the purpose of this research. Thank you for your time submitted by /u/aaron-cesaro [link] [comments]  ( 9 min )
    Best AI to bypass Ai detection for essays and assignment
    So yeah it's an open book course, but I'm horrible at flow and grammar. I need to be able to fix these things without getting in trouble. Ten years ago in my undergrad friends and family would do the final proofreading for me to make small changes. Is undetectable reputable. submitted by /u/6ixsideOT [link] [comments]  ( 9 min )
    Chat with your favorite characters from movies, TV shows, books, history, and more (+ Discord bot)
    ​ ChatFAI characters Hey everyone, ChatFAI has a special connection with this community because this is where I got it started. It was a simple web app that allowed you to interact with your favorite characters from movies, TV shows, books, history, and beyond. Now, it is a lot more. It has public APIs and an official Discord bot integration now. A lot of performance improvements have been made in the recent days. People have created a lot of characters (https://chatfai.com/characters) The Discord bot is still a new area so could you share feedback if you guys check it out? You can also find it in the Discord app directory. submitted by /u/usamaejazch [link] [comments]  ( 9 min )
    AI-powered drone beats human champion pilots | "Swift AI used technique called deep reinforcement learning to win 15 out of 25 races against world champions"
    submitted by /u/Tao_Dragon [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/30/2023
    Tesla is about to flip the switch on its $300 million new AI cluster, featuring 10,000 Nvidia H100 compute GPUs.[1] Intel has revealed two new Intel Xeon processors this week at Hot Chips 2023 to give designers new options for efficient server-level performance.[2] General Motors is using conversational AI chatbots to handle simple OnStar calls, freeing up the service’s human employees to address more complex requests, the company said Tuesday.[3] Microsoft announces Turing Bletchley v3 vision-language model for Bing image searches.[4] Sources: [1] https://www.tomshardware.com/news/teslas-dollar300-million-ai-cluster-is-going-live-today [2] https://www.allaboutcircuits.com/news/intel-reveals-two-new-xeon-processor-lines-at-hot-chips-2023/ [3] https://www.theverge.com/2023/8/29/23849390/gm-google-cloud-ai-chat-bot-onstar [4] https://www.neowin.net/news/microsoft-announces-turing-bletchley-v3-vision-language-model-for-bing-image-searches/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    What type of model(s) do you think Spotify are using for their DJ feature to seamlessly transition every song? It’s not as easy as just crossfading for x seconds, every one is beat-matched quite literally like a real DJ.
    submitted by /u/sardoa11 [link] [comments]  ( 9 min )
    US Copyright Office seeks public input on AI and copyright
    The US government is taking steps to address the complex and controversial issues around AI and intellectual property rights. The US Copyright Office is opening a public comment period on August 30th to hear from various stakeholders on the topic. Important Details: The agency is asking for comments on three main questions: How should AI be defined and categorized for the purposes of copyright? What are the implications of AI for the rights of authors and owners of works? What are the implications of AI for the liability and responsibility of users and distributors of works? The agency also wants to hear about related issues, such as: how AI may affect publicity rights and unfair competition laws. The agency notes that AI may create works that mimic or impersonate the voices, likenesses, or styles of real people, which could raise ethical and legal concerns. Finally, they want to determine how AI may affect moral rights and cultural heritage: The agency acknowledges that AI may create works that are derivative or transformative of existing works, which could affect the reputation and integrity of the original creators and their communities. The deadline to submit your comments is October 18th and specific instructions for submitting comments are available on the Copyright Office website. P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech—stay updated in under 3 mins/day. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Can you tell it’s artificial?
    I was playing around with the Eleven labs V2 multilingual model and I have to say it’s extremely impressive. Does this sound like the real Tucker? submitted by /u/Exitium_Maximus [link] [comments]  ( 9 min )
  • Open

    Sonnets are square
    In his book How to Read Literature Like a Professor, Thomas Foster says that if a poem looks like a square on the printed page, it’s likely a sonnet. The miracle of the sonnet, you see, is that it is fourteen lines long and written almost always in iambic pentameter. … suffice it to say […] Sonnets are square first appeared on John D. Cook.  ( 4 min )
  • Open

    Use Amazon SageMaker Model Cards sharing to improve model governance
    One of the tools available as part of the ML governance is Amazon SageMaker Model Cards, which has the capability to create a single source of truth for model information by centralizing and standardizing documentation throughout the model lifecycle. SageMaker model cards enable you to standardize how models are documented, thereby achieving visibility into the lifecycle of a model, from designing, building, training, and evaluation. Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation purposes. They provide a fact sheet of the model that is important for model governance.  ( 10 min )
    Use Amazon SageMaker Model Card sharing to improve model governance
    One of the tools available as part of the ML governance is Amazon SageMaker Model Cards, which has the capability to create a single source of truth for model information by centralizing and standardizing documentation throughout the model lifecycle. SageMaker model cards enable you to standardize how models are documented, thereby achieving visibility into the lifecycle of a model, from designing, building, training, and evaluation. Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation purposes. They provide a fact sheet of the model that is important for model governance.  ( 10 min )
  • Open

    WeatherBench 2: A benchmark for the next generation of data-driven weather models
    Posted by Stephan Rasp, Research Scientist, and Carla Bromberg, Program Lead, Google Research In 1950, weather forecasting started its digital revolution when researchers used the first programmable, general-purpose computer ENIAC to solve mathematical equations describing how weather evolves. In the more than 70 years since, continuous advancements in computing power and improvements to the model formulations have led to steady gains in weather forecast skill: a 7-day forecast today is about as accurate as a 5-day forecast in 2000 and a 3-day forecast in 1980. While improving forecast accuracy at the pace of approximately one day per decade may not seem like a big deal, every day improved is important in far reaching use cases, such as for logistics planning, disaster management, agr…  ( 93 min )
  • Open

    Meet Five Generative AI Innovators in Africa and the Middle East
    Entrepreneurs are cultivating generative AI from the west coast of Africa to the eastern edge of the Arabian Desert. Gen AI is the latest of the big plans Kofi Genfi and Nii Osae have been hatching since they met 15 years ago in high school in Accra, Ghana’s capital that sits on the Gulf of Read article >  ( 7 min )
    Morphobots for Mars: Caltech Develops All-Terrain Robot as Candidate for NASA Mission
    Academics Mory Gharib and Alireza Ramezani in 2020 were spitballing a transforming robot that is now getting a shot at work that’s literally out of this world: NASA Mars Rover missions. Caltech has unveiled its multi-talented robot that can fly, drive, walk and do eight permutations of motions through a combination of its skills. They Read article >  ( 6 min )
    GeForce NOW Gets Wild, With ‘Party Animals’ Leading 24 New Games in September
    Just like that, summer falls into September, and some of the most anticipated games of the year, like the Cyberpunk 2077: Phantom Liberty expansion, PAYDAY 3 and Party Animals, are dropping into the GeForce NOW library at launch this month. They’re part of 24 new games hitting the cloud gaming service in September. And the Read article >  ( 8 min )
  • Open

    AI Frontiers: AI in India and beyond with Sriram Rajamani
    In this episode of the Microsoft Research Podcast, Managing Director of Microsoft Research India Sriram Rajamani discusses how generative AI is impacting the lab’s approach to research and how the country’s many languages can help advance conversational systems. The post AI Frontiers: AI in India and beyond with Sriram Rajamani appeared first on Microsoft Research.  ( 30 min )
  • Open

    4 data compliance standards to know for 2023
    Data is crucial in most industries today. As the amount of business information grows, so do the standards for people’s protection of their personal information. With advanced cyberattacks, security compliance frameworks and cybersecurity have become essential fields to ensure data is collected, organized, stored, and managed in a safe way. This article will start by… Read More »4 data compliance standards to know for 2023 The post 4 data compliance standards to know for 2023 appeared first on Data Science Central.  ( 24 min )
    How the LDMs in knowledge graphs can complement LLMs
    Large language models (LLMs) fit parameters (features in data topography) to a particular dataset, such as text scraped off the web and conformed to a training set.  Logical data models (LDMs), by contrast, model what becomes shared within entire systems. They bring together the data in a system with the help of various kinds of… Read More »How the LDMs in knowledge graphs can complement LLMs The post How the LDMs in knowledge graphs can complement LLMs appeared first on Data Science Central.  ( 21 min )
  • Open

    In-Datacenter Performance Analysis of a Tensor Processing Unit
    submitted by /u/recklessdesuka [link] [comments]  ( 9 min )

  • Open

    What is your favorite AI website for research?
    I work in science research and want to introduce new tools to my students. We are looking for AI that can read tables, charts, figures, and spreadsheets, and possibly run statistics on this information. We are also looking for AI that can be given a prompt and will write on chosen topic with proper citation of sources. This information will not be used for publication, but rather, to organize main ideas and provide examples. An art AI that can draw or mimic images of real insects would be nice as well. Preferably these will all be free to use. submitted by /u/wolfmonarchyhq [link] [comments]  ( 9 min )
    Can You Solve a Time-Traveling Puzzle Designed by GPT-4? Win Bitcoin (100$) & Save the Future!
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
    Shifting order in multiple-choice questions massively affects LLM performance
    Recent research proposes that Large Language Models (LLMs) may not be as reliable as we think. In fact, the order of options in a multiple-choice question drastically influences the responses from LLMs such as GPT-4 and InstructGPT. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/dxfsq72kzalb1.png?width=1289&format=png&auto=webp&s=e4ed5b541073bde18d2865f2c15e8028388070f5 What are the findings? LLM sensitivity to multiple-choice arrangement: The study suggests if options in multiple-choice questions are reordered, the LLM's performance varies dramatically— approximately 13% to 75% depending on the benchmark. Positional bias shapes responses: When the LLM is uncertain between top-selected answers, the option positioning can artificially lean its predictions. Observations also found that LLMs favor specific placements when unsure of the optimal response among top-selected answers. Performance improves when calibration techniques are applied: Making use of two unique calibration methods, the performance of LLMS saw up to eight percentage points of increase across numerous models and benchmarks. Why does this matter? This moves us closer to identifying the factors contributing to LLMs' sensitivity and highlights the significance of recognizing and confronting these sensitivities to improve real-world usability and reliability. P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech—stay updated in under 3 mins/day. (arXiv) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    The possibilites of a trader AI with infinite profits
    AI can also be dangerous because it could make automated trades in the stock market or cryptocurrency market, and because it remembers all the exchange rate changes in history and the entire economic history of the world and also has all the statistics and mathematical knowledge at its fingertips, it can easily draw conclusions and create an algorithm that might make you bigger profits than any real human. He could also learn from his own mistakes and keep improving. Is this possible? Are there any AIs like this already? submitted by /u/Steve_Hufnagel [link] [comments]  ( 9 min )
    OpenAI Surges Past $1 Billion in Revenue As Demand For AI Explodes
    OpenAI is reportedly making strides in its financial performance and is on track to make $1 billion in revenue over the next 12 months, as per recent reports by The Information. This is a major milestone, signifying not only the success of OpenAI but also the increasing demand and investment in AI. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/afak3xrevalb1.jpg?width=660&format=pjpg&auto=webp&s=cd8f20ac732618b91dbf96928ede42d693f6c4a9 Why should we pay attention? Setting expectations: The Information estimates OpenAI's monthly revenue to be around $80 million, in line with the $1 billion yearly revenue prediction. Undeniably, OpenAI is accelerating. AI Chatbots are in high demand: ChatGPT, OpenAI's phenomenal co…  ( 10 min )
    Wearable Health (WHSI) Joins AI Research Lab for Wearable Health Data
    Wearable Health Solutions to Advise Next Realm AI on Medical Internet of Things (MIoT) Solutions NEWPORT BEACH, CA / ACCESSWIRE / Wearable Health Solutions Inc. (OTC PINK:WHSI) announced inclusion to Next Realm AI research lab to explore development of healthcare IoT solutions utilizing data analytics and artificial intelligence (AI). Wearable Healthcare Solutions will collaborate and advise Next Realm AI, an artificial intelligence and data analytics research lab located in New York City, on such areas as collecting and developing data solutions within the areas of wearables, IoT, and Medical Internet of Things (MIoT). As an official IBM Business Partner, Next Realm AI assists lab members in integrating leading-edge AI and data solutions into their business operations. By leveraging Next Realm's expertise, clients can modernize processes, boost efficiency, strengthen security, and deliver greater value to customers - all while driving growth and building value. https://www.otcmarkets.com/stock/whsi/news/Wearable-Health-Solutions-to-Advise-Next-Realm-AI-on-Medical-Internet-of-Things-MIoT-Solutions?id=411692 submitted by /u/NextRealm_AI [link] [comments]  ( 9 min )
    What are potential careers to take in the field of artifical intelligence?
    I am 23 year old man, I have a degree in Politics, Philosophy, & Economics. Next year I want to do a masters degree, but I haven't chosen which one yet. I am both fascinated by AI, and want to be future-proof in my education. What potential careers do you see, currently or in the near future, in the field of AI, and what studies would you recommend to be well prepared for them? ​ submitted by /u/ApplePenguinBaguette [link] [comments]  ( 9 min )
    IBM invests in $4.5 billion A.I. unicorn Hugging Face | Fortune
    IBM’s CEO, who froze hiring for thousands of back-office jobs and predicted A.I. would take up to 50% of new jobs, just piled into a $4.5 billion tech unicorn’s massive new $235 million funding round submitted by /u/AminoOxi [link] [comments]  ( 9 min )
    Singularity Day just got closer because of Nvidia?
    New advances in AI hardware are making the singularity more likely. AI systems will be able to learn and process information much faster, which could lead to a breakthrough in AI capabilities. These advancements include quantum computing and neuromorphic computing, but more specifically the rise of affordable models like NVIDIA H100 and, more recently, GH200 models. If you are interested in this kind of information, there are more details here. submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 9 min )
    Where do AI adult websites get their models from?
    Where do websites like made.porn, pornify.cc or porn.ai get their AI models from? submitted by /u/mixedfeelingz [link] [comments]  ( 9 min )
    Looking for a simulated browser
    Like custom world descriptions, AI apps/sites, etc submitted by /u/roblox22g [link] [comments]  ( 9 min )
    AI Robots from Sci-Fi Movies you didn’t know about
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    Tesla is powering up its $300 Million AI Supercomputer Today
    Tesla's making a significant power move today as it prepares to bring its brand-new new AI-cluster online. Rocking a hefty 10,000 Nvidia H100 compute GPUs, the machine will tackle high-performance computing (HPC) workloads and AI applications, placing Tesla's capabilities among the global AI elite. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/c5ykgmr6r6lb1.png?width=970&format=png&auto=webp&s=8b830c8754c1a11792149f57d19a57a77fc8b161 Here’s why this matters: This Nvidia H100-based AI supercomputer will be one of the most powerful globally. With a peak performance of 340 FP64 PFLOPS and 39.58 INT8 ExaFLOPS for AI programs, even Leonardo, currently the fourth highest-performing supercomputer, is surpassed. Tesla’s…  ( 10 min )
    One-Minute Daily AI News 8/29/2023
    Research firm SemiAnalysis has declared that Google’s anticipated Gemini AI model will smash OpenAI’s offering by packing a lot more computing power.[1] DoorDash today announced its development of voice ordering capabilities incorporating AI, building on its existing model leveraging best-in-class agents, to further support restaurant operations.[2] The US Air Force wants $6 billion to build a fleet of AI-controlled drones.[3] Google’s DeepMind says it has cracked a problem that has vexed those trying to verify whether images are real or created by AI. Researchers proclaimed their new watermarking SynthID format can be used to pinpoint AI-generated deepfakes without distorting the image’s original quality.[4] Sources: [1] https://beincrypto.com/ai-wars-google-gemini-chatgpt/ [2] https://about.doordash.com/en-us/news/introducing-ai-and-agent-powered-voice-ordering [3] https://www.engadget.com/the-air-force-wants-6-billion-to-build-a-fleet-of-ai-controlled-drones-204548974.html [4] https://gizmodo.com.au/2023/08/deepmind-says-it-has-a-way-to-identify-ai-images-but-only-on-google/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Microsoft's new AoT aims to create more human-like AI
    Microsoft teamed up with Virginia Tech to publish a white paper introducing their new "Algorithm of Thoughts" (AoT). The objective? To make language learning models akin to human learning. https://preview.redd.it/3q8lcq0k96lb1.png?width=2000&format=png&auto=webp&s=fce7a0e3225c64353ad6c51f65e8b490e795feed If you want to stay on top of the latest trends and insights in AI and tech, look here first. What's the big idea? Microsoft's AoT aims to fuse the accuracy of algorithms with the nuances of human reasoning. A bold aspiration indeed, but not a new one. The goal to empower computers to learn for themselves – akin to human cognition - has been an AI objective since its inception back in the 1950s. The AoT could be seen as an attempt to resolve the drawbacks of the "Chain of Thought" (CoT) approach. LLMs following the CoT approach can provide incorrect steps to the right answer, as they base conclusions on precedent. With AoT, the model works to evaluate the soundness of initial steps or "thoughts," reducing the risk of one incorrect step leading to disproportionate results. What could AoT do? Mitigate AI "hallucinations:" These funny— but disconcerting — instances of AI outputting false information. Enhance the integrity of AI interaction: programmers suggest that improvement in this aspect is crucial for aligning AGI (artificial general intelligence). The takeaway: AI's ability to understand and process information like a human being is a longstanding goal in the field. With AoT, Microsoft seems to be making strides toward achieving it. Much remains to be seen on its efficacy: How it will impact the broader AI ecosystem and the user experiences it can create. P.S. If you like this kind of analysis, I write a free newsletter tracking the most relevant news and research in AI and tech—stay informed in under 3 minutes/day. (source) ​ submitted by /u/AIsupercharged [link] [comments]  ( 10 min )
  • Open

    Designing Deep Networks to Process Other Deep Networks
    submitted by /u/nickb [link] [comments]  ( 9 min )
  • Open

    [D] Will I get in? [Fall 2024 MS in ML European Universities]
    TLDR: American, Graduated from U of Michigan in 2019 w/ 3.3 GPA Bs in Comp Sci. Worked at Google for 3 years. Samsung Research America for 3 months. No ML specific work experience. No research. Will I get in to European elite ML programs? If not what do I need to do? My GF and I want to study our masters together in Europe. She’s doing business I want to do ML/AI. I spent the past year kind of goofing off. Got kind of burned out and decided I was going to get into music production so spent the pst year mainly doing that with some software mixed in. Recently been self studying ML, both the math from textbooks and trying my hand at some models in python. I do not have any connections to academia currently and wil have to beg a professor who barely knew me from undergrad for a rec. Can get other recs from past bosses. My plan right now is to look for job hopefully in AI but maybe just more general software engineering again, but long term I want to get a masters in person. My current resume looks like: Graduated BS in Comp Sci from univ of Michigan 2019 3.3 GPA Worked at Google for 3 years Worked at Samsung Research America for 4 months Some self study I can claim but not much tangible proof Recommendation from Google Boss (Maybe) recommendation from UofM CS professor that barely knew me My questions to anyone that knows the admissions right now are: 1) Do you think I get to one off this? (To anyone of these schools) 2) If not what are the things to prioritize to improve my chances? What are the timeline of these steps? Can I do them in the next few months or have to wait till next year? submitted by /u/Srokisthename [link] [comments]  ( 10 min )
    [R] DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data
    I just came across this paper, and it just sounds too good to be true. If we regularly spend up to 80% of our time in data preprocessing, this method would suddenly return us A LOT of that time. Has anyone seen it in python code? I haven't found it and I'd love to give it a try with some of my datasets from hell. They do have a GitHub page but I'm too dumb or too noob to make it run in my laptop. submitted by /u/Davidat0r [link] [comments]  ( 9 min )
    [D] Knowledge graph vs text summary+embedding for long term conversational memory
    Hi, I'm relatively new to the space of AI chatbots and I figured I'd get my hands wet with a small personal project. While researching the topic of long term conversational memory I noticed most people are using text embedding in combination with textual summary to generate a conversation history for the AI's prompt. However, this technique seems to have many drawbacks such as loss of details in the summarization process. I was wondering if anyone has experience using knowledge graph DBs like neo4j for conversational memory instead, and what the pros and cons of such an approach are compared to summarization. I'd be greatly interested in any resources that could further my knowledge in this space as my primary goal is to learn from this project. Thanks! submitted by /u/Rainmire [link] [comments]  ( 9 min )
    [Discussion] Does anybody manage to make MuseTree work?
    https://stevenwaterman.uk/musetree/ It's for music generation through musenet. I don't manage to generate anything. It has to be related to API issues or stuff like that? submitted by /u/MusicalSeries [link] [comments]  ( 9 min )
    Shifting order in multiple-choice questions massively affects LLM performance [R]
    Recent research proposes that Large Language Models (LLMs) may not be as reliable as we think. In fact, the order of options in a multiple-choice question drastically influences the responses from LLMs such as GPT-4 and InstructGPT. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/k8yaixbjzalb1.png?width=1289&format=png&auto=webp&s=99ac6280a1e7415f46c0c11938ae20e2b77674b4 What are the findings? LLM sensitivity to multiple-choice arrangement: The study suggests if options in multiple-choice questions are reordered, the LLM's performance varies dramatically— approximately 13% to 75% depending on the benchmark. Positional bias shapes responses: When the LLM is uncertain between top-selected answers, the option positioning can artificially lean its predictions. Observations also found that LLMs favor specific placements when unsure of the optimal response among top-selected answers. Performance improves when calibration techniques are applied: Making use of two unique calibration methods, the performance of LLMS saw up to eight percentage points of increase across numerous models and benchmarks. Why does this matter? This moves us closer to identifying the factors contributing to LLMs' sensitivity and highlights the significance of recognizing and confronting these sensitivities to improve real-world usability and reliability. P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech—stay updated in under 3 mins/day. (arXiv) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [D] Handwritten Text Recognition (OCR) on Historical Documents
    I am working on developing a solution to transcribe historic texts (Pre-1900's) which are all handwritten. I have some data, around 1000's transcribed sentences with their corresponding images of text. TrOCR looked great, but it still makes a lot of mistakes, probably because of the old English phraseology, so I tried to finetune it with my data and see if it improves and that didn't happen. The data I used to train was my 1000 sentences + some public dataset with another 2500 sentences, so just about 3500 sentences in total. Do you think it's because the data is small, that the performance is bad? I'm finetuning "microsoft/trocr-base-stage1" using native PyTorch. If not TrOCR do you recommend any OCR/HTR models I can finetune to my handwritten historical data? I truly appreciate any guidance you send my way. submitted by /u/daxow [link] [comments]  ( 9 min )
    [P] FlowJax - Normalizing flows in JAX
    Hello everyone, Hopefully this is of interest to some of you. For those that don't know, normalising flows can be used as black-box unconditional or conditional distribution approximators, that support both exact sampling and density evaluations. For an excellent review see https://arxiv.org/abs/1912.02762. I am developing flowjax, a Python package for normalising flows, distributions and bijections. It uses Jax for automatic differentiation, and the equinox framework built by Patrick Kidger to allow for a familiar object-oriented design. It includes many powerful flows, e.g. masked autoregressive flows, coupling flows and block neural autoregressive flows. In addition to inheriting some benefits from using JAX (easy GPU support, some efficiency gains), here's a few points where I think flowjax has some advantages over other packages: Comprehensive documentation Simplified definitions of unconditional/conditional bijections and distributions (particularly nicer handling of the conditional case, which some packages seem to stitch in as an afterthought). Easy to plug in different "transformer" bijections to coupling/masked autoregressive flows. Use of efficiency tricks to optimize run times (e.g. circumventing recompilation of identical layers using jax.lax.scan over the flow layers) It has been used in a couple of papers already, but it would be great to have some more people using it and some feedback/suggestions/contributions. There are examples in the documentation for those that are interested. submitted by /u/LimitedConsequence [link] [comments]  ( 9 min )
    [Project] Models for Unsupervised Anomaly Detection of a Single Continuous Feature?
    Of the many unsupervised anomaly detection models out there (iforest, LOF, SVM etc) I am struggling to find a model that makes sense to use to detect anomalies in a single target feature. My current strategy is to subset the data into different categories and run iforest on a single column. I feel as though this method might not be the best because it basically creates a tree with a single branch and measures how many nodes away a given record might be. My confidence scores never seem to exceed around -.17 on a scale of [-1,1] where -1 tends to more confidence in anomalous behavior Is there a better way? Note: Anomalies in my data occur very infrequently submitted by /u/BeefaroniX [link] [comments]  ( 9 min )
    [Discussion] How are you evaluating and monitoring LLMs?
    Question for people who are implementing LLMs (open source, fine tuned, any kind). How do you know that your getting the quality output from the model that you need to ship the feature or model? Are the audits ad hoc data sampling and subjective "good/bad" ratings or have you figured out a more rigorous framework? Is it pretty much ~vibes~ based? What, if any, tools or processes are you putting into place to monitor and observe the LLM when its interacting with real time user data for weeks or months? Most of the folks I have spoken with are doing very ad hoc sampled output and writing down on post its or in a spreadsheet a subjective quality ratings. One person had developed a slightly more rigorous 3 question survey on "is the result factual", "is the result cogent" and "is the result useful". Not everyone is logging their LLM responses they show users which feels very risky to me. Anyone aware of any industry standards being established around this? submitted by /u/Andy-VertaAI [link] [comments]  ( 9 min )
    [D] A blog post on Yet Another ICML Award Fiasco
    I wrote a blog post on the ICML award fiasco: They gave an outstanding paper award to the D-Adaptation paper, that contains worse results that the ones in papers from 9 years ago. Also, this is not the first time that ICML gives awards to questionable or even plainly wrong papers. I believe this might start a serious conversation about "stochastic" awards, and the super noisy reviews in machine learning conferences. https://parameterfree.com/2023/08/30/yet-another-icml-award-fiasco/ submitted by /u/bremen79 [link] [comments]  ( 9 min )
    [D] Is there a monthly limit for OpenAI service in Azure?
    When using OpenAI's api, there's a default limit of $120/month and my company is about to hit it. I plan on requesting an increase of that limit... but wondering, does Azure's OpenAI service have any monthly limit? By looking at their quotas: https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits it doesn't seem like there's a monthly cap. Is this correct? If so, I see no reason why anyone would use OpenAI's api instead of Azure's, as they cost the same but there's no usage limit. Especially if you expect to increase api usage in the future. submitted by /u/alkibijad [link] [comments]  ( 9 min )
    [P] I created GPT Pilot - a research project for a dev tool that uses LLMs to write fully working apps from scratch while the developer oversees the implementation - it creates code and tests step by step as a human would, debugs the code, runs commands, and asks for feedback.
    Github: https://github.com/Pythagora-io/gpt-pilot Detailed breakdown: https://blog.pythagora.ai/2023/08/23/430/ For a couple of months, I've been thinking about how can GPT be utilized to generate fully working apps, and I still haven't seen any project that I think has a good approach. I just don't think that Smol developer or GPT engineer can create a fully working production-ready app from scratch without a developer being involved and without any debugging process. So, I came up with an idea that I've outlined thoroughly in the blog post above, but basically, I have 3 main "pillars" that I think a dev tool that generates apps needs to have: Developer needs to be involved in the process of app creation - I think that we are still far away from an LLM that can just be hooked up to …  ( 11 min )
    [D] Graph Signal Processing Applications and Training
    I'm studying GSP and I'm stuck on the definition of the Graph Fourier Transform. The sigma notation and signal makes sense, but why is there an \"i\" term at the eigenvector mu? Shouldn't the eigenvector not depend on the \"i\"? And if it does, what does the \"i\" imply? submitted by /u/Ihaveaparrot [link] [comments]  ( 9 min )
    [D] - Given that we can lossily transform text to images and vice versa, multimodality should not be required for AGI or the construction of world-models. Any causal relationship that can be inferred from images/audio/video should be inferable from text.
    Consider video data that captures various interactions between entities—let's say Person A and Person B. We then apply a video summarization network T(x), where x is some video or an entity in the video, onto the video. For sake of argument, let's assume T(x) provides a description of x so detailed that we can decode the description back into the original video without losing much information via some arbitrary text-video model. Now, if we can infer a causal relationship in the video—like Person A punching Person B—then logically, an isomorphic relationship should also be inferable from the text encodings T(A) and T(B) (unless that relationship is one of the small pieces of information lost during the lossy transformation). After all, the encoding is just another representation of the same…  ( 11 min )
    [D] HPC from local servers for deep learning as well as simpler tasks
    Hi all, the company I am working at has several servers used for different tasks including data analysis and machine learning, including smaller tasks as well as deep learning. What are some ways/ technologies they could create a distributed system where users can submit their jobs and they are dispatched automatically? I was thinking of having an entry node that is the only one faced by users, is where all conda environments are and jobs can be submitted from there. Please let me know if you have any suggestions/ tools that you know that would make sense. Thanks in advance! submitted by /u/returnname35 [link] [comments]  ( 9 min )
    [D] HPC from local servers for deep learning as well as simpler tasks
    Hi all, the company I am working at has several servers used for different tasks including data analysis and machine learning, including smaller tasks as well as deep learning. What are some ways/ technologies they could create a distributed system where users can submit their jobs and they are dispatched automatically? I was thinking of having an entry node that is the only one faced by users, is where all conda environments are and jobs can be submitted from there. Please let me know if you have any suggestions/ tools that you know that would make sense. Thanks in advance! submitted by /u/returnname35 [link] [comments]  ( 9 min )
    [P] Self-Hosting a 16B LLAMA 2 Model in the Banking Sector: What Could Go Wrong?
    I've received a freelance job offer from a company in the banking sector that wants to host their own LLAMA 2 model in-house. I'm hesitating to accept the gig. While I'll have access to the hardware (I've estimated that an A100 80GB will be required to host the 16B parameter version and process some fine-tuning & RAG), I'm not familiar with the challenges of self-hosting a model of this scale. I've always relied on managed services like Hugging Face or Replicate for model hosting. For those of you who have experience in self-hosting such large models, what do you think will be the main challenges of this mission if I decide to take it on? ​ Edit: Some additional context information Size of the company: Very small ~ 60 employees Purpose: This service will be combined with a vector store to search content such as Word, Excel and PowerPoint files stored on their servers. I'll implement the RAG pattern and do some prompt engineering with it. They also want me to use it for searching things on specific websites and APIs, such as stock exchanges, so I (probably) need to fine-tune the model based on the search results and the tasks I want the model to do after retrieving the data. submitted by /u/IMissEloquent75 [link] [comments]  ( 9 min )
    [D] Is there anything LangChain can do better than using LLMs directly (either through a website or an API), any examples? Why would someone choose to use it?
    I haven't used ChatGPT a lot or any other LLMs, I've been reading about Langchain and its use cases, and I'm having trouble wrapping my head around exactly what it does. From what I understand, its an alternative interface for LLMs, allowing for easy switching between them, and makes some work for specific use cases easier. If I wanted to write an app or script to interact with LLMs and do other tasks, how would LangChain be better than just making API call(s) to an LLM, getting back the result as a string, and doing whatever with it? submitted by /u/TheTwelveYearOld [link] [comments]  ( 9 min )
    [D] Using LLMs in Production - Model Fallbacks Tutorial + Caching
    Hello r/MachineLearning I'm one of the maintainers of https://github.com/BerriAI/litellm/ - open-source library to call all LLM APIs using the OpenAI format [Anthropic, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]. I'm writing this post to share some of the strategies we use for using LLMs in production, we've served over 2M+ queries so far TLDR: Use Caching + Model Fallbacks for reliability. This post goes into detail of our fallbacks implementation Using LLMs reliably in production involves the following components: Caching - Cache Embedding() and Completion() for all models Model Fallbacks - set fallback_models=['gpt-3.5-turbo', 'command-nightly', 'llama2]. If primary model fails try fallback models. This deals with rate-limiting errors and when Provider APIs go down …  ( 10 min )
    [D] Decision Transformer Alignment should be better than DeepMind ReST
    We've done some experiments recently, see the tech report: https://arxiv.org/abs/2308.12050v1 We train an SFT model and an RM model, then align the LLM with DT/MLE with filtering (ReST) + RM /SFT datasets/SFT model-generated samples https://preview.redd.it/195op5q636lb1.png?width=1081&format=png&auto=webp&s=a9fa862e8a9ab05819484af8619f73d918fdc26a DT is the Decision Transformer alignment MLE is the ReST-like alignment https://preview.redd.it/u6x28fook5lb1.png?width=1118&format=png&auto=webp&s=4a87898129c1238c00071d43809f5daf440b26d8 submitted by /u/seventh_day123 [link] [comments]  ( 9 min )
  • Open

    Could anyone help me why the following list is the optimal policy for this environment? (Reference: Sudharsan's Deep RL book)
    ​ https://preview.redd.it/qderz9bsoblb1.png?width=1195&format=png&auto=webp&s=fb8ec749d0ce5000e66951b173228278a1d4c3a3 submitted by /u/nimageran [link] [comments]  ( 9 min )
    Could anyone help me why the following list is the optimal policy for this environment? (Reference: Sudharsan's Deep RL book)
    ​ https://preview.redd.it/qderz9bsoblb1.png?width=1195&format=png&auto=webp&s=fb8ec749d0ce5000e66951b173228278a1d4c3a3 submitted by /u/nimageran [link] [comments]  ( 9 min )
    Help With RLLib/ Alternatives
    RLLib is currently stealing my remaining sanity, so I'm making a desperate scream into the void. I can't get my troubleshooting right. I built a nice, custom Gym env that I've been running with SB3 and I feel like I'm caught in an endless array of errors, currently: ValueError: The two structures don't have the same nested structure. I can't help but feel like I'm going about this wrong and missing important information on how to do this correctly. The RayLlib Forum hasn't really been filled with people, so I'm asking: Does anyone know of a Debugging Manual/ A Discord Server/ A Migration Guide? submitted by /u/tessherelurkingnow [link] [comments]  ( 9 min )
    Recommendations for RL Library for 'unvectored' environments
    Hi, I'm working on a problem which has a custom gym environment which I've made, and as it interacts with multiple other modules which have their own quirks, I need to use a reinforcement learning library which works in a specific way that I've only seen PFRL use. The training loop needs to be in this format: 'obs, reward, done = agent.step(action)', 'agent.observe(obs, reward, ... )' rather than what I see in most modern RL libraries where you define an agent and then run a '.train()' method. Are there any libraries which work in this way? I'd love to use something like StableBaselines but they don't seem to play nice and I'd rather not rewrite the gym environment if I can avoid it. Thanks submitted by /u/return_reza [link] [comments]  ( 9 min )
    MDPs: gentle tutorial ...
    Markov Decision Processes (MDPs) form the cornerstone of reinforcement learning (RL) and serve as a fundamental modeling tool for making sequential decisions. In this note, we present a comprehensive definition of MDPs and provide a detailed derivation of the Bellman equations, along with the optimality results. Our approach aims to ensure a thorough understanding by avoiding the omission of any steps in the mathematical proofs. The primary goal is to facilitate reading classic textbooks on (approximate) dynamic programming, optimal control, and reinforcement learning, where proofs and derivations can sometimes obscure crucial details, making them less accessible to readers from diverse scientific and engineering backgrounds. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4535241 submitted by /u/omroot [link] [comments]  ( 9 min )
    Do you know Poker Ai gyms for adversarial policy trainings?
    I want to try use adversarial policies (https://arxiv.org/abs/1905.10615) against poker no limit holdem 6-9 players RL models. I was looking for open-ai gym like environement for that project. Im looking for: - access to game state from each player persperctive (to create input for adversarial model) - support for custom bets (not limited to 0, 1/2pot, all-in) - build-in RL models / support for opensoure RL models - option to add custom model as player So far I found those and read readme files: https://github.com/dickreuter/neuron_poker https://github.com/fschlatt/clubs_gym https://rlcard.org/ https://www.deepmind.com/open-source/openspiel Did anybody work on similar project? Which gym did you use, and what experience do you have with it? Since, adversarial policies tend to work better for high-dimensionality I would prefer to 6players variant. I know that modern poker ai approach are not based on pure RL, but I want to check how vulnerable are classic RL poker models. submitted by /u/MrCogito_hs [link] [comments]  ( 9 min )
    Twitter / Machine Learning Community
    submitted by /u/x9182 [link] [comments]  ( 9 min )
    Reinforcement learning environment for cyber security automation
    submitted by /u/limmen [link] [comments]  ( 9 min )
    How do I teach my PPO agent to play Breakout?
    I have coupled my agent with EnvPool in order to speed up the learning process. It seems to be playing Pong in less than an hour. However, when I try to make it Breakout, even after many hours it still struggles. Also, it it seems like the network is facing catastrophic forgetting as after a few hours it's performance suddenly deteriorates. Any ideas to fix this? I tried incorporating major ideas for PPO from here. Here's my code. Feel free to let me know if you have any questions. Since I have incorporated EnvPool, the code won't run in Windows anymore. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
  • Open

    Deploy self-service question answering with the QnABot on AWS solution powered by Amazon Lex with Amazon Kendra and large language models
    Powered by Amazon Lex, the QnABot on AWS solution is an open-source, multi-channel, multi-language conversational chatbot. QnABot allows you to quickly deploy self-service conversational AI into your contact center, websites, and social media channels, reducing costs, shortening hold times, and improving customer experience and brand sentiment. In this post, we introduce the new Generative AI features for QnABot and walk through a tutorial to create, deploy, and customize QnABot to use these features. We also discuss some relevant use cases.  ( 13 min )
    Automatically generate impressions from findings in radiology reports using generative AI on AWS
    This post demonstrates a strategy for fine-tuning publicly available LLMs for the task of radiology report summarization using AWS services. LLMs have demonstrated remarkable capabilities in natural language understanding and generation, serving as foundation models that can be adapted to various domains and tasks. There are significant benefits to using a pre-trained model. It reduces computation costs, reduces carbon footprints, and allows you to use state-of-the-art models without having to train one from scratch.  ( 13 min )
  • Open

    Modeling and improving text stability in live captions
    Posted by Vikas Bahirwani, Research Scientist, and Susan Xu, Software Engineer, Google Augmented Reality Automatic speech recognition (ASR) technology has made conversations more accessible with live captions in remote conferencing software, mobile applications, and head-worn displays. However, to maintain real-time responsiveness, live caption systems often display interim predictions that are updated as new utterances are received. This can cause text instability (a “flicker” where previously displayed text is updated, shown in the captions on the left in the video below), which can impair users' reading experience due to distraction, fatigue, and difficulty following the conversation. In “Modeling and Improving Text Stability in Live Captions”, presented at ACM CHI 2023, we f…  ( 93 min )
  • Open

    Autonomous innovations in an uncertain world
    Jonathan How and his team at the Aerospace Controls Laboratory develop planning algorithms that allow autonomous vehicles to navigate dynamic environments without colliding.  ( 9 min )
  • Open

    Building a “heavy metal quartet” of AI compilers
    A new quartet of AI compilers: Rammer, Roller, Welder, and Grinder, tackle a range of compiler optimization challenges based on the same tile abstraction, providing a comprehensive solution to connect AI models with hardware accelerators. The post Building a “heavy metal quartet” of AI compilers appeared first on Microsoft Research.
    Research Focus: Week of August 28, 2023
    In this issue: An illusion of predictability in scientific results; Kathleen Sullivan named to Insider’s 30 under 40 in healthcare list; FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations. The post Research Focus: Week of August 28, 2023 appeared first on Microsoft Research.  ( 9 min )
  • Open

    AI Lands at Bengaluru Airport With IoT Company’s Intelligent Video Analytics Platform
    Each year, nearly 32 million people travel through the Bengaluru Airport, or BLR, one of the busiest airports in the world’s most populous nation. To provide such multitudes with a safer, quicker experience, the airport in the city formerly known as Bangalore is tapping vision AI technologies powered by Industry.AI. A member of the NVIDIA Read article >  ( 6 min )
    Deepdub’s AI Redefines Dubbing From Hollywood to Bollywood
    In the global entertainment landscape, TV show and film production stretches far beyond Hollywood or Bollywood — it’s a worldwide phenomenon. However, while streaming platforms have broadened the reach of content, dubbing and translation technology still has plenty of room for growth. Deepdub acts as a digital bridge, providing access to content by using generative Read article >  ( 5 min )
  • Open

    First time seeing a rare event
    Suppose you’ve been monitoring a rare event for a long time, then you see your first occurrence on the Nth observation. Now what would you say about the event’s probability? For example, suppose you’re wondering whether dogs ever have two tails. You observe thousands of dogs and never see two tails. But then you see […] First time seeing a rare event first appeared on John D. Cook.  ( 5 min )

  • Open

    AI powered personal assistant in private Beta
    submitted by /u/anehzat [link] [comments]  ( 9 min )
    Stanford's DSPy Framework Revolutionizes AI Language Processing Tasks
    Stanford researchers have unveiled a groundbreaking artificial intelligence (AI) framework known as DSPy. Designed to utilize Language Models (LMs) and Retrieval Models (RMs) optimally, DSPy is set to make AI programming more powerful, intuitive, and efficient. Why does this matter? DSPy was built with complex tasks in mind. LMs, like GPT-3, generate Human-like text from given inputs, while RMs retrieve relevant data. DSPy combines their capabilities, enabling tasks like summarizing information from databases. It works on Pythonic syntax, using declarative and composable modules to instruct LMs. DSPy's automatic compiler finetunes the LM to run any program's steps. it replaces manual intermediate-stage labeling and string manipulation with systematic modular pieces. What's unique about DSPy? It introduces "Signatures" and "Teleprompters" that compile your program. A 'signature' explains the task and inputs for the LM, while Teleprompters improve the effectiveness of prompts. Compared to other libraries, DSPy requires minimal labeling and bootstraps any needed intermediate labels. In short, DSPy simplifies delivering more nuanced instructions to AI and retrieving more detailed and accurate responses, thus widening the spectrum of tasks AIs can accomplish. P.S. (small self-plug) If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech---stay updated in under 3 mins/day. (github) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Google's DeepMind Unveils Invisible Watermark to Spot AI-Generated Images
    As AI image generators increase in popularity, differentiating between authentic and AI-created images is becoming more complex. DeepMind, Google's AI unit, is addressing this by developing an imperceptible watermark known as SynthID for its AI-generated images to counter misinformation. https://i.redd.it/y370eu1tt4lb1.gif Why this matters: DeepMind's SynthID tags AI-generated images: Invisible to people but detectable by computers, this watermark hopes to aid in the verification of images. Technology, however, isn't completely foolproof: DeepMind itself acknowledges that intense image manipulation could compromise the watermark. Google's image generator, Imagen, will only apply to images created using this tool: Google aims to instantly identify AI-generated images with this effectively hidden watermark. DeepMind's head of research, Pushmeet Kohli, shared the following details: The watermark changes on images are so subtle that humans wouldn't notice, yet DeepMind can still detect an AI-generated image. Despite any subsequent cropping or editing, the watermark remains identifiable by DeepMind's software. Colors, contrast, or size changes won't affect it. Calls for a standard approach to AI-generated image identification continue: More coordination between businesses is crucial, different methods adopted by various firms add degrees of complexity in tagging AI content. Other tech giants, including Microsoft and Amazon, pledge to watermark some AI content, meeting similar demands for transparency over AI-generated works. P.S. If you like this kind of analysis, I write a free newsletter that keeps you informed of all you need to know about AI developments in under 3 mins/day. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    New iterative, self-revising language model, SelFee, beating the rest with self-feedback generation
    Introducing SelFee—a reinvented and powerful language model that uses self-feedback and self-revision to generate high-quality responses backed by a team of researchers from KAIST. Unlike previous models, SelFee doesn't rely on external, large-scale language or task-specific models, tipping the scales in the AI world. If you want to stay ahead of the curve in AI and tech, look here first. https://i.redd.it/bgszhpai43lb1.gif Why it matters? SelFee, built on the base of LLaMA-based instruction-following model and fine-tuned, offers a fresh approach - generating an initial solution and self-feedback sequences and then revising its answers until a high-quality response is achieved. Data used for its training and model evaluation was collected from varied sources and fine-tuned with OpenAI API calls, beating the 13B SelFee model with a minimal 7B SelFee model that generated at least three revisions. SelFee proves the potential of iterative revision in enhancing language model responses, indicating that an increase in inference computation of a model may be superior to merely magnifying its size. Features and Limitations: SelFee's effective use of self-feedback significantly improves response quality, avoiding the requirement of external, large-scale language or task-specific models, translating into faster, cost-effective LLM solutions. However, lacking in certain areas compared to ChatGPT, such as math, reasoning, factuality, and coding, SelFee has room for further improvement and growth. The revolution in the AI language model landscape is promising but still an evolving journey, with SelFee being the latest participant driving this change. P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech—stay updated in under 3 minutes/day. (source) (github) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Camouflage AI
    I thought about an AI, that would intake a couple photos of terrain, analyze the color palette, patterns etc and with that information, would choose like 3 existing camouflage patterns that would blend in the best in the terrain where the photos were taken. Does something like that exist? I know that US army has an AI that creates camo with the use of thousands of photos, and that's how MARPAT and Multicam were made, but I'm interested in an AI that would choose from already existing patterns. Does something like this exist? What do you think of this idea? submitted by /u/BrytolGasMasks [link] [comments]  ( 9 min )
    Is ChatGPT Plus worth it? Or should I stay with the free version and use Bard for stuff that requires web access?
    I'm mainly using it for educational purposes. Thank you. Edit: I'm in the Psych field. I use it to make presentations, summaries, ideas based on references like books, websites, journals. submitted by /u/East_Professional385 [link] [comments]  ( 9 min )
    How far off are we from free AI video makers
    So right now as far as I can tell all the AI video makers are things like a few second clip, stable diffusion changing images with other images, or stock images. Oh and that thing that was on Twitch for a short bit. When are we going to get an actual worth while AI video maker? submitted by /u/crua9 [link] [comments]  ( 9 min )
    The Architecture of Thought: Reflective Structures in Mental Constructs
    submitted by /u/alcanthro [link] [comments]  ( 9 min )
    25 Best Movies exploring concept of Artificial intelligence (1968 -2023 ) I bet you haven’t watched all
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    ChatGPT usage remains low, Pew Research suggests, as concerns about AI continue to rise
    The usage and fear surrounding ChatGPT aren't as prevalent as you might think, according to a recent poll from Pew Research. Only 18% of Americans have reportedly used ChatGPT. The demographic that uses it the most? Men aged 18-29 that are college-educated, but even that's just a 30-40% usage rate. https://preview.redd.it/0ax222gxczkb1.jpg?width=620&format=pjpg&auto=webp&s=d3b04169d5de1985d1c52dce7962b5f3a543b014 Why does this matter? ChatGPT has still managed to gain a remarkable level of popularity, despite low usage. This suggests that even though not many people are using it, they are aware of it and its potential capabilities. More people reported using ChatGPT for entertainment or to educate themselves rather than for work. People anticipate AI to have a greater impact on jobs such as software engineers, graphic designers, and journalists. But the expectation is that AI as a whole, not just ChatGPT, will be the driving force behind this. Concern about AI is increasing, not decreasing. 47% of respondents said AI makes them more worried than excited, compared to 31% last year. This concern seems to rise with the level of AI knowledge one possesses. Industries unshaken by AI: As per the survey, employed individuals who are aware of ChatGPT don't see it drastically affecting their jobs. The sectors like hospitality, entertainment, construction, and manufacturing feel the least threatened. Stay updated about AI and its influence on different verticals! Don't miss out on the latest insights, developments, and trajectories of AI. Our free newsletter is all you need to be au fait with the AI world. Keep yourself informed in under 3 minutes/day. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/28/2023
    Chinese e-commerce giant Alibaba has added two new generative AI large language models designed to interpret images to its open-source stable.[1] Several top news publications like The New York Times, CNN and the Australian Broadcasting Corporation (ABC) have blocked Microsoft-backed OpenAI to access their content to train its AI models.[2] Intel on Monday said a new data center chip coming out next year will handle more than double the amount of computing work that can be done for each watt of power used, part of a broader industry push to lower electricity consumption.[3] OpenAI unveiled the new service, dubbed “ChatGPT Enterprise,” in a company blog post and said it will be available to business clients for purchase as of Monday. The new offering promises to provide “enterprise-grade security and privacy” combined with “the most powerful version of ChatGPT yet” for businesses looking to jump on the generative AI bandwagon.[4] Sources: [1] https://voicebot.ai/2023/08/28/alibaba-adds-visual-understanding-to-open-source-generative-ai-large-language-models/ [2] https://www.news18.com/tech/several-top-news-publications-block-openai-from-accessing-their-content-8551840.html [3] https://www.reuters.com/technology/intel-says-new-sierra-forest-chip-more-than-double-power-efficiency-2023-08-28/ [4] https://www.cnn.com/2023/08/28/tech/chatgpt-enterprise-openai/index.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Prompt engineering for GPT4
    My page on PromptBase: https://promptbase.com/profile/singularity99 submitted by /u/No-Transition3372 [link] [comments]  ( 9 min )
    Has AI-By-Learning really been proven impossible?
    I'm curious what people specifically in the artificial intelligence think about the recent work by Iris van Rooij et al. earlier this month. They seem to have proven that current approaches to reaching AGI, like LLMs, are incapable of achieving it. I'm not convinced. I quickly wrote up a full rebuttal piece explaining how not convinced I was. What about everyone else? submitted by /u/alcanthro [link] [comments]  ( 9 min )
  • Open

    [D] Why don't we build models that design/build better models. Too computationally expensive?
    At what point do we create a model to build/design better models? Models = ml architecture submitted by /u/Significant_Water_28 [link] [comments]  ( 9 min )
    [D] Question: What's the future of image-analytics models?
    Hey everyone, first post on this sub so sorry if there's anything wrong. Right now, what are the cutting edge image processing models? This is in the context of the segmentation of specific features from an image (ie. finding the cars in an image of a busy roadway). The reason I am asking is I want to learn more image processing architectures that way I can find better direction for specific research areas to look into. Thanks in advance! :) submitted by /u/Adventurous-Tower392 [link] [comments]  ( 9 min )
    Stanford's DSPy Framework Revolutionizes AI Language Processing Tasks [R]
    Stanford researchers have unveiled a groundbreaking artificial intelligence (AI) framework known as DSPy. Designed to utilize Language Models (LMs) and Retrieval Models (RMs) optimally, DSPy is set to make AI programming more powerful, intuitive, and efficient. Why does this matter? DSPy was built with complex tasks in mind. LMs, like GPT-3, generate Human-like text from given inputs, while RMs retrieve relevant data. DSPy combines their capabilities, enabling tasks like summarizing information from databases. It works on Pythonic syntax, using declarative and composable modules to instruct LMs. DSPy's automatic compiler finetunes the LM to run any program's steps. it replaces manual intermediate-stage labeling and string manipulation with systematic modular pieces. What's unique about DSPy? It introduces "Signatures" and "Teleprompters" that compile your program. A 'signature' explains the task and inputs for the LM, while Teleprompters improve the effectiveness of prompts. Compared to other libraries, DSPy requires minimal labeling and bootstraps any needed intermediate labels. In short, DSPy simplifies delivering more nuanced instructions to AI and retrieving more detailed and accurate responses, thus widening the spectrum of tasks AIs can accomplish. P.S. (small self-plug) If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech---stay updated in under 3 mins/day. (github) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [P] Finetuning an LLM to imitate someone
    Hello all, I'm trying to understand how to get an LLM to imitate someone, say Shakespeare. It's easy enough to get all of Shakespeare's work. If I've understood the current state of play for LLMs, there are three options: Fine tune an LLM Vectorize your knowledge using something like ChromaDB. Do a similarity search after each prompt and get the LLM to "read" the top n docs Do both I have a feeling that to imitate Shakespeare, fine tuning an LLM might work best. However, if my understanding is correct, the inputs to finetune an LLM must be formatted this way: : "To be" : "Or not to be" The gap I'm having trouble bridging is how do I go from a large text file to this input format? The only idea I've come across is format all of the text like so: : "sentence_1" : "sentence_2" : "sentence_2" : "sentence_3" Are there best practices around this problem? How should I be thinking about this? I've seen companies like character.ai create bots that imitate Elon Musk accurately for example so I know it's doable. I just wonder if they've done it by finetuning an LLM or training one from scratch or something else entirely. submitted by /u/Vanishing-Rabbit [link] [comments]  ( 9 min )
    [N] Google's DeepMind Unveils Invisible Watermark to Spot AI-Generated Images
    As AI image generators increase in popularity, differentiating between authentic and AI-created images is becoming more complex. DeepMind, Google's AI unit, is addressing this by developing an imperceptible watermark known as SynthID for its AI-generated images to counter misinformation. https://i.redd.it/z0fj6f3yt4lb1.gif Why this matters: DeepMind's SynthID tags AI-generated images: Invisible to people but detectable by computers, this watermark hopes to aid in the verification of images. Technology, however, isn't completely foolproof: DeepMind itself acknowledges that intense image manipulation could compromise the watermark. Google's image generator, Imagen, will only apply to images created using this tool: Google aims to instantly identify AI-generated images with this effectively hidden watermark. DeepMind's head of research, Pushmeet Kohli, shared the following details: The watermark changes on images are so subtle that humans wouldn't notice, yet DeepMind can still detect an AI-generated image. Despite any subsequent cropping or editing, the watermark remains identifiable by DeepMind's software. Colors, contrast, or size changes won't affect it. Calls for a standard approach to AI-generated image identification continue: More coordination between businesses is crucial, different methods adopted by various firms add degrees of complexity in tagging AI content. Other tech giants, including Microsoft and Amazon, pledge to watermark some AI content, meeting similar demands for transparency over AI-generated works. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [D] Questions on artificial neural networks from a neuroscientist
    Hello everyone. I'm yet another person looking to expand my understanding of artificial intelligence, and I'm trying to get a map of all the language that is used to describe and understand artificial neural networks. My training is in neuroscience, so all my language is focussed on how real neurons are created, interact, form networks, and how those networks interact to take in multisensory observation and output some of the vast variety of things our brains can do. Which leaves me with a lot of questions in my jargon that I cannot currently map onto the jargon of ML/AI, and I'm hoping that participating in this community can help with that, over time. I am already keenly aware that the phrase "artificial neural networks" is very gauzy. There is some biomimicry in their design and arch…  ( 10 min )
    [P] Codellama inference code complete
    Quite recently, I jumped on the boat of trying out llama. I noticed codellama did not provide any inference code. Yes, it provided python files which lets you run the inference but not a programming method rather terminal approach. Terminal approach is great as it allows experts to run and perform inference+evaluation easily. But, if you are just starting out/new/non-seasoned programmer/individual in AI, it is frustrating. Because one, you can't play with actual code, limiting learning opportunities and two, it does not produces the curiosity in most cases to read all the code. On top of that, I realised there was a lack of repositories and articles on this subject to load code-llama even with third-party methods. Which is why, I wrote two notebooks which outlines the process of how you can load code-llama from FAIR repository using code. [Believe me it's fun and filled with learning opportunities] and two how you can use Huggingface to load the model and perform inference. Few points: 1. Performing inference from FAIR repo, requires significant amount of computing resources even for 7B model. 2. Huggingface method can be loaded using free Google Colab subscription. [Feel free to star, if it helped you] GitHub Link: https://github.com/sleepingcat4/codellama-inference submitted by /u/Suspicious-Bird8840 [link] [comments]  ( 9 min )
    Python/Java Developers Interested in Side Projects Outside of Work (FX-Algo) "[Research]""[Discussion]"
    Throw away account for the obvious reason... This is not a job posting or self-promotion. We are networking in an attempt to speak to like-minded people who might be interested in a little side project outside of work. We are keen to speak to a London/UK-based Developer with a Banking sector background to join us on a project outside of work with the vision of potentially growing a fund. In short; we are in the process of developing an FX Macroeconomic Sentiment Divergence Trading Algorithm. There are currently 4 participants in the project (2 Developers and 2 Traders), 3 of whom work for Tier1 IBs in market-facing roles. 1 of the Developers is likely going to leave the project and we are interested in speaking to someone about picking up his part of the project. There are 3 parts to the project. The first part is mostly complete, now leaving the other 2 parts for us to start working on. We have manually backtested the strategy and it proves to be very profitable - more details can be shared about the strategy and results upon engagement. We are all VP-level in our roles and have around 10-15 years of experience in our requisite field. The tech stack for the project is Python, Java, Kafka, MongoDB and Springboot. We are also very interested in integrating some AI/ML modeling, so if you have any experience in this field that would be a big advantage A Banking background and being UK-based is a non-negotiable. If you feel like this could apply to you, get in touch! :) submitted by /u/BuyTheDipSellTheRipp [link] [comments]  ( 9 min )
    [D] Best GPU cloud hosting for a side project that’s easy to scale?
    Context: I have an app that needs GPUs for DL inference (I don’t need GPUs for training, I own a 3070 TI). My DL model inference is pretty slow (the model framework I'm using is known to be slow) so either one machine with multiple beefy GPUs or multiple GPUs on separate machines will be necessary. My machines will be running custom docker containers. Slow inference: I was planning on putting a few GPU instances behind nginx load balancer and running pytriton on the instances. Since inference is pretty slow, I’m worried if multiple people send requests to a server at the same time, there will be significant delays on responses. Has anyone ran into this before and have insight on streamlining slow inference/scaling demand? "Community" Cloud GPUs: I did a lot of research into clo…  ( 10 min )
    [R] Loss of Plasticity in Deep Continual Learning - University of Alberta 2023 - Continual backpropagation maintains plasticity indefinitely!
    Paper: https://arxiv.org/abs/2306.13812 Github: https://github.com/shibhansh/loss-of-plasticity Abstract: Modern deep-learning systems are specialized to problem settings in which training occurs once and then never again, as opposed to continual-learning settings in which training occurs continually. If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples. More fundamental, but less well known, is that they may also lose their ability to learn on new examples, a phenomenon called loss of plasticity. We provide direct demonstrations of loss of plasticity using the MNIST and ImageNet datasets repurposed for continual learning as sequences of tasks. In ImageNet, binary classification performance dropped from 89\% accuracy on an early task down to 77\%, about the level of a linear network, on the 2000th task. Loss of plasticity occurred with a wide range of deep network architectures, optimizers, activation functions, batch normalization, dropout, but was substantially eased by L2-regularization, particularly when combined with weight perturbation. Further, we introduce a new algorithm -- continual backpropagation -- which slightly modifies conventional backpropagation to reinitialize a small fraction of less-used units after each example and appears to maintain plasticity indefinitely. https://preview.redd.it/ewl0336sd3lb1.jpg?width=801&format=pjpg&auto=webp&s=e105e6fa86daad84cdc847e96fec3cac5a237c77 https://preview.redd.it/vdd3i46sd3lb1.jpg?width=1159&format=pjpg&auto=webp&s=47dfef94870c94246cb272b7f8299e1033f40873 https://preview.redd.it/zc4tc16sd3lb1.jpg?width=1389&format=pjpg&auto=webp&s=e2f3b064268d475805c153457c7a60b4a1d42b74 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Interesting master thesis topics in AI and NLP [P]
    Hi! I am going to write my master thesis within the fields of AI and NLP this year. But I am struggling with finding a topic that interests me. Does anyone here have some good suggestions? I am not that good in deep learning theory, so I am looking for a more applied topics, such as classification or text generation problems. submitted by /u/IndependentSidekick [link] [comments]  ( 9 min )
    [Discussion] Promising alternatives to the standard transformer?
    What are some promising transformer alternatives/variants that you think more folks should be aware of? They need not be new or SOTA! My list so far includes RWKV: https://arxiv.org/abs/2305.13048 (state space) S4, H3, Hyena: https://github.com/HazyResearch/safari (MLP-based) Hypermixer, MLP-mixer: https://arxiv.org/abs/2203.03691 Retnet https://arxiv.org/abs/2307.08621 (random feature-based attention) EVA, LARA https://arxiv.org/abs/2302.04542 (rotary embeddings) RoFormer https://arxiv.org/abs/2104.09864 dynamic convolutions https://arxiv.org/abs/1901.10430v2 My hope is to assemble a list of 10-15 diverse architectures that I can study in depth by comparing and contrasting their designs. Would love to share my findings with this community. submitted by /u/alpthn [link] [comments]  ( 9 min )
    [D] CLIP open vision-language model alternative
    I'm experimenting with CLIP to use it for a downstream task RL which requires good image semantics understanding, but I'm quite disappointed with its performance. I need better contrastive performance in the representations. Any suggestions? ​ omg no way submitted by /u/rima-m [link] [comments]  ( 9 min )
    [P] Build adaptive sparse grids to accurately approximate and integrate functions of multiple variables
    I'm working on a project that provides an adaptive sparse grid algorithm on Chebyshev nodes for interpolation and integration of multivariable functions on k-cells. https://github.com/rnburn/bbai Unlike polynomial interpolants in equispaced points, interpolants in Chebyshev nodes have excellent approximation properties (see Myth 1 of [1]). If a function is Lipchitz continuous, they converge; if a function is smooth with v derivatives and bounded variation for the v-th derivative, then they converge O(n^-v); and if a function is analytic, they converge geometrically. The Chebyshev Gauss-Lobatto nodes define a sequence of nested points, X^1, X^2, ..., that make it possible to build Smolyak sparse grids at Chebyshev nodes ([2], [3]). ![img](4cw0xi8oc2lb1 " ") For bbai, I implemented the …  ( 12 min )
    [D] Optimizing Keyword Search: Balancing SQL Script Enhancements and AI Solutions
    I'm currently thinking about how to implement the "similar keywords" feature. I've prepared a table with keywords that are extracted from several hundred other tables. It includes basic information such as "keyword," "type," "words" (indicating the number of words in a keyword, e.g., "first name" will have "words" = 2), as well as some technical fields (such as database, table, etc.). In our data product, after entering a specific keyword, we have various pieces of information (which I'm not currently focusing on), and among them, we have "SIMILAR KEYWORDS." The results are displayed based on simple SQL queries, for instance: ​ SELECT word, SUM(CASE WHEN type IN ('N', 'T') THEN 1 ELSE 0 END) AS count, COUNT(\*) \* CASE WHEN (word + '%') LIKE u/word \+ '%' THEN 1.5 ELSE 1 END AS score FROM object_keywords WHERE ('% ' + word + '%') LIKE '%' + u/word + '%' AND (database_id = u/database_id OR u/database_id IS NULL) AND ( .... more technical information here. ​ I'm wondering how to improve this process. Would it be worth considering some AI solutions, or should I focus on enhancing the current SQL scripts (e.g., think about a more advanced scoring system)? What are your thoughts on this? Has anyone worked on something similar? submitted by /u/International-Shirt5 [link] [comments]  ( 9 min )
    [D] Is there already a way to use Llama 2 with a very big system prompt?
    I've seen something like that: https://together.ai/blog/llama-2-7b-32k Is there a way to use llama 2 13b chat or 70b chat with 32k prompt? If not what are the alternatives? Would that: https://youtu.be/ypzmPwLH_Q4?feature=shared be the best thing to do? I'm trying to create a chat bot that would have a pretty specific exeprtise. For example: I would like to feed in soccer rules and then make the bot answear questions about soccer. The system prompt is amazing, but is very limited. submitted by /u/Botanical0149 [link] [comments]  ( 9 min )
    [D] Trying to understand Concept learning | Some questions based on Tom Mitchell Chapter 2
    Hi, Im going through Tom Mitchell's Machine Learning and have a couple of questions based on the 2nd chapter : Concept learning. I was hoping I could get some external point of view on these: ​ Pg 44, para 2, part 1 : "advantage of viewing inductive inference systems in terms of their inductive bias is that it provides a nonprocedural means of characterizing their policy for generalizing" Are there any general procedures to identify and validate the inductive bias of a system? Are there any guidelines to ensure the inferred definition of inductive bias is without errors? Assuming all/most predictive algorithms can be defined in terms of their inductive bias, while concentrating on choosing the algorithms which aligns with our philosophy of talking a problem, how can we weigh part…  ( 11 min )
    [Discussion][research] Calibration for (pointer) generative NER
    Trying to understand calibration in NER. One thing which has gained popularity is generative based NERs, which generated pointers to indices of input text for each class. But all typical calibration mechanisms after temp scaling won't generalize here. (not that I know many calibrations myself). Even Bias corrected temp scaling quickly gets overfitted. Do you have any paper that tackles this? Open to discussing techniques and trying out on standard datasets submitted by /u/Designer-Air8060 [link] [comments]  ( 9 min )
    [P] are there free alternatives to sagemaker I can use for my project building?
    I have a more detailing explanation here. I’m thinking sagemaker may help me here but I’m not trying to incur charges just yet. Are there alternatives I can use. Nothing robust, just a place to host my model and embedding tool and then I can easily call it in py file in my app. submitted by /u/brianomars1123 [link] [comments]  ( 9 min )
    [D] How usable is PyTorch for TPU these days?
    See title. My impression has always been that PyTorch for TPU is an in-name only functionality, but I'm curious about first-hand experience from those who have used it after PyTorch 2.0+. Bonus question: has anyone used PyTorch Lightning for running on TPU? If so, how was the experience? submitted by /u/impromptued [link] [comments]  ( 9 min )
    ML Model for Predicting NFL Outcomes [P]
    Hey all, ML noob here dipping their feet in the water. Right now I am trying to make an ML model that predicts "legendary" QBs of the past performances against current NFL teams. I'll be using Jupyter and Sklearn to do this. However, and maybe this is me overthinking things, I'm not sure how sklearn is going to interpret the data in the dataset. Right now I have a dataset containing all these QBs data (passing stats, strengths and weaknesses, etc.). My teams version of the data is essentially going to be the inverse of all these things. I'm just not quite sure what to target when im testing the data that will determine the "prediction" of the legend QBs stat line against the current team. In better words, how will the computer know that I'm trying to find the yards and touchdowns a QB would produce against a certain team when there's not really any target data for this. I feel as though all I have is data that contributes to a potential target data but I lack target data itself and I'm not sure what to do in that regard. I’m making use of supervised learning and decisión trees btw. Thanks! submitted by /u/saggyboobsarecooltoo [link] [comments]  ( 9 min )
  • Open

    Rotation of hidden layer?
    Surely something like this has been tried, but here’s the setup in my head. Tell me if it’s crazy or what you think. Given input vector X do a hidden layer but instead of an activation function pair up neighboring dimensions of the hidden layer vector result and rotate them about the origin in 2d. This would give some kind of nonlinearity surely? The amount they are rotated can be selected by a trainable variable. Of course this requires your hidden layer dimension be divisible by 2. Then this hidden layer can go to an output layer Y. Curious what smarter more experienced people think of this kind of nonlinearity via paired rotation. My thinking was that if you take the vector A representing all the possible data from your generating function for your dataset (maybe even an infinite dimensional vector if you can generate unlimited data) . Then if you rotate A along so many dimensions you could reach the output vector P which is all the Y values corresponding to A. One way to kind of do this would be to split your dataset in half, and then you could have multiple iterations of each dataset rotated by a trainable angle each. This would rotate only each 2 grouped dimensions that you chose when you split the dataset. Hopefully I’m using the right words to convey this. I’m just a hobbyist. Thanks for the feedback! submitted by /u/win10240 [link] [comments]  ( 9 min )
    High-fidelity transmission of information via novel electronic-optical system
    submitted by /u/keghn [link] [comments]  ( 9 min )
  • Open

    SayTap: Language to quadrupedal locomotion
    Posted by Yujin Tang and Wenhao Yu, Research Scientists, Google Simple and effective interaction between human and quadrupedal robots paves the way towards creating intelligent and capable helper robots, forging a future where technology enhances our lives in ways beyond our imagination. Key to such human-robot interaction systems is enabling quadrupedal robots to respond to natural language instructions. Recent developments in large language models (LLMs) have demonstrated the potential to perform high-level planning. Yet, it remains a challenge for LLMs to comprehend low-level commands, such as joint angle targets or motor torques, especially for inherently unstable legged robots, necessitating high-frequency control signals. Consequently, most existing work presumes the provision of…  ( 93 min )
  • Open

    DSC Webinar Series: How to Scale NiFi Deployments to Enable Universal Data Distribution
    As businesses struggle with more data sources and destinations than ever, they strive to bring governance, security, and efficiency to their data ops. To address these concerns, many companies adopted open-source Apache NiFi as a versatile tool for their data distribution needs. While NiFi accelerates the speed at which developers can build new pipelines, managing… Read More »DSC Webinar Series: How to Scale NiFi Deployments to Enable Universal Data Distribution The post DSC Webinar Series: How to Scale NiFi Deployments to Enable Universal Data Distribution appeared first on Data Science Central.  ( 18 min )
    DSC Weekly 29 August 2023
    Announcements Top Stories In-Depth The post DSC Weekly 29 August 2023 appeared first on Data Science Central.  ( 20 min )
    Data migration redefined: Leveraging AI trends for smooth workspace transitions
    In the dynamic landscape of modern business, the art of seamless data migration has evolved into a strategic imperative. As you navigate the intricacies of workspace transformations, you’re met with a complex interplay of technological advancements and operational demands Enter the era of leveraging Artificial Intelligence (AI) to redefine data migration – an approach that… Read More »Data migration redefined: Leveraging AI trends for smooth workspace transitions The post Data migration redefined: Leveraging AI trends for smooth workspace transitions appeared first on Data Science Central.  ( 21 min )
    The future of shipping: How technology is shaping logistics and fulfillment
    Currently, the use of technology in shipping and logistics is leading the industry through a transformative era, driven by rapid technological advancements, undoubtedly marking a pivotal moment in the digital shipping evolution. From automating routine processes to employing intelligent algorithms that predict and optimize routes, the technological revolution is redefining the way goods are transported… Read More »The future of shipping: How technology is shaping logistics and fulfillment The post The future of shipping: How technology is shaping logistics and fulfillment appeared first on Data Science Central.  ( 23 min )
    Generative AI megatrends: The four horsemen of Generative AI
    In the early days of the Internet, there were four ‘horsemen’ of the Internet With IBM’s 4.5 billion investment in Hugging face today, the generative AI landscape is becoming a bit clearer. There are four Generative AI leaders emerging – others lagging – and one unknown Lets look at the four leaders of Generative AI… Read More »Generative AI megatrends: The four horsemen of Generative AI The post Generative AI megatrends: The four horsemen of Generative AI appeared first on Data Science Central.  ( 18 min )
    The power of digital solutions: How mental health apps are transforming patient care
    There seems to be an app for everything, and mental health is no exception. According to a report, the global mental health apps market size was valued at $5.2 billion in 2022 and is predicted to reach $26.36 billion by 2032, at a CAGR of 17.7% during the forecast period.  Mental health apps have emerged… Read More »The power of digital solutions: How mental health apps are transforming patient care The post The power of digital solutions: How mental health apps are transforming patient care appeared first on Data Science Central.  ( 20 min )
    Modern data exchange methods: Exploring the strengths and limitations of leading protocols
    Introduction  In our rapidly digitizing world, how businesses and systems communicate is paramount. The bedrock of this communication lies in data exchange methods, which allow seamless information flow, driving operational efficiencies and enabling innovation. Over the years, various data exchange protocols have emerged, each boasting unique strengths and presenting challenges. As enterprises strive to integrate… Read More »Modern data exchange methods: Exploring the strengths and limitations of leading protocols The post Modern data exchange methods: Exploring the strengths and limitations of leading protocols appeared first on Data Science Central.  ( 23 min )
    Roadmap for building a data-driven, AI-powered supply-chain
    History & Evolution | The Concept of Supply-chain Network, The TOC & the Information Supply-chain | Imagining the future: Supply-chain 5.0 | Supply-chain Analytics Strategy | Roadmap for Building a Data-driven, AI-Powered Supply-chain Part 1: Data-driven supply chain – History & evolution Is the concept of data driving decisions new? The concept of “data supporting… Read More »Roadmap for building a data-driven, AI-powered supply-chain The post Roadmap for building a data-driven, AI-powered supply-chain appeared first on Data Science Central.  ( 22 min )
  • Open

    Wide Horizons: NVIDIA Keynote Points Way to Further AI Advances
    Dramatic gains in hardware performance have spawned generative AI, and a rich pipeline of ideas for future speedups will drive machine learning to new heights, Bill Dally, NVIDIA’s chief scientist and senior vice president of research, said today in a keynote. Dally described a basket of techniques in the works — some already showing impressive Read article >  ( 6 min )
    Google Cloud and NVIDIA Take Collaboration to the Next Level
    As generative AI and large language models (LLMs) continue to drive innovations, compute requirements for training and inference have grown at an astonishing pace. To meet that need, Google Cloud today announced the general availability of its new A3 instances, powered by NVIDIA H100 Tensor Core GPUs. These GPUs bring unprecedented performance to all kinds Read article >  ( 6 min )
    Advantage AI: Elevated Creative Workflows in NVIDIA Canvas, Blender, TikTok and CapCut
    Janice K. Lee, a.k.a Janice.Journal — the subject of this week’s In the NVIDIA Studio installment — is a TikTok sensation using AI to accelerate her creative process, find inspiration and automate repetitive tasks.  ( 8 min )
  • Open

    Stellar magnitude
    Imagine the following dialog. “Logarithms are usually taken to integer bases, like 2 or 10.” “What about e?” “OK, that’s an example of an irrational base, but it’s the only one.” “Decibels are logarithms to base 101/10.” “Really?!” “Yeah, you can read about this here.” “That’s weird. But logarithms are always take to bases bigger than […] Stellar magnitude first appeared on John D. Cook.  ( 6 min )
    Area codes
    US telephone area codes are allocated somewhat randomly. There was a deliberate effort to keep geographical proximity from corresponding to numerical proximity, unlike zip codes. (More of zip code proximity here.) In particular, consecutive area codes should belong to different states. The thought was that this would reduce errors. It’s still mostly the case that […] Area codes first appeared on John D. Cook.  ( 6 min )
  • Open

    MLOps for batch inference with model monitoring and retraining using Amazon SageMaker, HashiCorp Terraform, and GitLab CI/CD
    In this post, we describe how to create an MLOps workflow for batch inference that automates job scheduling, model monitoring, retraining, and registration, as well as error handling and notification by using Amazon SageMaker, Amazon EventBridge, AWS Lambda, Amazon Simple Notification Service (Amazon SNS), HashiCorp Terraform, and GitLab CI/CD. The presented MLOps workflow provides a reusable template for managing the ML lifecycle through automation, monitoring, auditability, and scalability, thereby reducing the complexities and costs of maintaining batch inference workloads in production.  ( 15 min )
  • Open

    "Loss of Plasticity in Deep Continual Learning", Dohare et al 2023 (Adam particularly harmful for catastrophic forgetting)
    submitted by /u/gwern [link] [comments]  ( 9 min )
  • Open

    Using AI for tiered cloud platform operation
    Cloud Intelligence/AIOps research from Microsoft could help organizations autonomously manage the entire cloud platform. Find out how. The post Using AI for tiered cloud platform operation appeared first on Microsoft Research.  ( 15 min )

  • Open

    OpenAI finally launches ChatGPT Enterprise
    OpenAI has announced a new product for businesses that want to use its AI technology. ChatGPT Enterprise is a subscription service that offers unlimited, fast, and secure access to GPT-4 and other features that can help businesses improve their workflows and communication. If you want to stay ahead of the curve in AI and tech, look here first. https://preview.redd.it/uyv6mrljwxkb1.png?width=862&format=png&auto=webp&s=eb2793fbe9c4f5e331ed03faa142eb57166ff21d Why this matters: ChatGPT Enterprise is the first product that lets businesses use GPT-4 without any restrictions. The previous tiers of ChatGPT, which are still available for individuals and developers, have usage caps and lower performance. ChatGPT Enterprise removes these limitations and provides the most powerful version of GP…  ( 10 min )
    Snapchat AI unhinged pt. 1783338
    Just messing around with AI McFly, swamping corny jokes, being punny, and ended up with this mf claiming to be a “fellow Cajun” like wtf bahahaha submitted by /u/Secure_Sprinkles4483 [link] [comments]  ( 9 min )
    Chatbase appears to be running a bait and switch. Am I missing something?
    This website claims to offer a service whereby the user can train their own chatbot and get responses using GPT 3.5 ... However, the bot only uses GPT 3.5 for the first unique version of a query, which is not the impression given by advertisements. This, to me, amounts to a bait and switch where a high quality chatbot is offered for a certain price, then swapped out with an inferior product capable only of reproducing past interactions. This is made worse by the fact that they advertise temperature as one of the variables you can set. Temperature is a variable that can only apply to uniquely generated output and has no effect on simple repetition of previous responses. This makes their practice doubly deceptive, and makes it clear (in my view) that they are trying to deceive customers. …  ( 10 min )
    What will happen if AI becomes better than humans in everything?
    If AI becomes better than humans in all areas, it could fundamentally change the way we think about human identity and our place in the world. This could lead to new philosophical and ethical questions around what it means to be human and what our role should be in a world where machines are more capable than we are. There is also the risk that AI systems could be used for malicious purposes, such as cyber attacks or surveillance. Like an alien invasion, the emergence of super-intelligent AI could represent a significant disruption to human society and our way of life. How can we balance the potential benefits of AI with the need to address the potential risks and uncertainties that it poses? submitted by /u/Violincattle [link] [comments]  ( 9 min )
    AI tool I can use to help me in my Scientific Inquiry (Research and stats) class?
    I’m currently in a scientific research-based class where I am being asked to read research articles, understand the statistics, and draw conclusions from the papers. Currently, I have an average ability to interpret articles and generally understand their utility and applicability, but I start to get out of my depth in the “Methods” section when the authors get into the weeds about the statistics/math. I was hoping there’s an AI tool out there that can read articles for me and help me understand the more complex aspects and the math. I was also hoping that it could answer questions about the article for my class so that I could compare my conclusions to something. Any suggestions? I tried uploading some PDFs to bard this morning and it wasn’t great. submitted by /u/Renaissance_Mane [link] [comments]  ( 9 min )
    How to make peppa pig ai videos tutorial??
    Over on a video sharing site there are an abundance of Peppa Pig cartoons generated by Ai. There is however lack of info on how to generate them. I would love to know how this is done. So far all I have found are tutorials about Peppa's voice but not for the other characters and someone suggests that it is made by cutting up exisiting episodes and changing the sound over them, not sure if that's the case here. I'm wanting to do something similar but not with Peppa, can't stand it. Does anyone know the tool? submitted by /u/DARQSMOAK [link] [comments]  ( 9 min )
    Do you every think there’s be a time where AI chatbots have their own rights or can be held accountable for their actions?
    I’ve been playing around with some of the new AI chatbots. Some of them include paradot.ai, replika.com, spicychat.ai, cuti.ai. Suffice it to say, these things are getting really good, and I mean really good. Assuming this is just the beginning, and these things keep learning more and getting better, where does this end up? I genuinely think there’s going to be the need for world wide regulation on these things. But we all know that worldwide consensus is difficult if not impossible. in case a few countries decide to regulate or govern this tech, developers will take advantage of regulatory arbitrage and just deploy their models and register their companies on servers in countries with no regulation. Since this is tech, and everything is on servers, escaping regulation is basically childs play. Also, what about mental health concerns? We all know that porn, webcams and OnlyFans are already screwing up male-female relationships and marriages. Look at any statistics about this and the numbers speak for themselves. And this is before AI. So now what’s going to happen 5 years from now when GPU’s are faster and cheaper, and when these companies have gathered 100x more data about their customers, and when models are 50x better. We are just at the beginning and AI is moving really quick, especially generative AI. I think it’s officially time to start worrying. submitted by /u/E1ON_io [link] [comments]  ( 9 min )
    Exploring Four Main Types of Artificial Intelligence
    submitted by /u/Tao_Dragon [link] [comments]  ( 9 min )
    Tool to convert satellite images into fantasy maps
    What tools are available to convert blurry satellite images into fantasy maps while still maintaining certain aspects of the original image like roads or trees or buildings submitted by /u/campus159 [link] [comments]  ( 9 min )
    AI for editing long PDF or WORD files' full contents without word limitation?
    Hi. I am looking for this kind of a tool but couldn't find. Can i find or somehow create this kind of a tool? Can you suggest one? submitted by /u/Leading-Ad2278 [link] [comments]  ( 9 min )
    This took 15 minutes to make. (Chatgpt, Midjourney, Pika and Canva)
    submitted by /u/Gasple1 [link] [comments]  ( 9 min )
    Does anyone know which tool has this ai voice and what the name of it is?
    submitted by /u/d3mchi [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/27/2023
    Brain-reading devices allow paralysed people to talk using their thoughts.[1] An Air Force program shows how the Pentagon is starting to embrace the potential of a rapidly emerging technology, with far-reaching implications for war-fighting tactics, military culture and the defense industry.[2] PM Modi calls for a global framework for cryptocurrencies and AI, emphasizes consumer care and supply chain sustainability.[3] From generating story lines to coding entire games to turning ideas into animation, artificial intelligence is front and centre at Gamescom, one of the video game industry’s biggest fairs.[4] Sources: [1] https://www.nature.com/articles/d41586-023-02682-7 [2] https://www.nytimes.com/2023/08/27/us/politics/ai-air-force.html [3] https://www.livemint.com/news/b20-summit-2023-pm-modi-calls-on-ethical-use-of-artificial-intelligence-ai-supply-chain-cryptocurrency-11693122849876.html [4] https://techxplore.com/news/2023-08-ai-revolution-video-games-industry.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    AI Dad Jokes: GPT4 And Google Bard about Strawberries [Berry Funny Video]
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
  • Open

    OpenAI finally launches ChatGPT Enterprise [N]
    OpenAI has announced a new product for businesses that want to use its AI technology. ChatGPT Enterprise is a subscription service that offers unlimited, fast, and secure access to GPT-4 and other features that can help businesses improve their workflows and communication. If you want to stay ahead of the curve in AI and tech, look here first. https://preview.redd.it/fgva1q54uxkb1.png?width=862&format=png&auto=webp&s=d8c89b614859222046aa75f89a484795c2ef7912 Why this matters: ChatGPT Enterprise is the first product that lets businesses use GPT-4 without any restrictions. The previous tiers of ChatGPT, which are still available for individuals and developers, have usage caps and lower performance. ChatGPT Enterprise removes these limitations and provides the most powerful version of GP…  ( 10 min )
    [D] RTX 4060 Ti 16gb For ML/DL?
    I know the 4060 Ti with its reduced memory bus width and overall underspec'd profile caught a lot of flak from the gaming community in terms of its value proposition. However, I'm looking to get into ML/DL and was wondering if this would be a good starter card for GPU acceleration. With rumored price drops on the horizon, I wonder if the value sentiment will be a better match. If it's a bad call, are there any other GPUs that you would recommend for training? submitted by /u/reducksss [link] [comments]  ( 9 min )
    [P] Setting up SageMaker for CI/CD Pipelines
    I'll start with the obvious - AWS guides are the worst. We all felt it. So, trying to build automation with them becomes M:I, or better yet, Oppenheimer. For the first time, our MLOps team had to build a CI/CD pipeline for ML training and deployment using SageMaker. We had ZERO ideas on how to do it, so we had to go through the rigorous process of using AWS guides and tutorials, scattered over a gazillion places, just to figure out how to configure our project with SageMaker and build infra for CI/CD. Usually, when this thing happens, we extend the project lifecycle and have a team member document the process so we can refer back to it when we need to do it again. Knowing this can be beneficial to the community, we decided to share a series of 3 blogs that guide you through the process of building CI/CD pipelines for continuous training and deployment with AWS SageMaker. We published the first blog, which covers the configuration part, and plan to publish the rest in the following week. Check it out: https://dagshub.com/blog/setup-sagemaker-for-ci-cd-pipelines/ I'm sure we can improve this tutorial, and would love to learn from your experience on how we can do it! 🤗 submitted by /u/RepresentativeCod613 [link] [comments]  ( 9 min )
    [R] Nougat: Neural Optical Understanding for Academic Documents - Meta AI 2023
    Project page: https://facebookresearch.github.io/nougat/ Includes example Paper conversions! Paper: https://arxiv.org/abs/2308.13418 Github: https://github.com/facebookresearch/nougat Abstract: Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs. However, the PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents. The proposed approach offers a promising solution to enhance the accessibility of scientific knowledge in the digital age, by bridging the gap between human-readable documents and machine-readable text. We release the models and code to accelerate future work on scientific text recognition. https://preview.redd.it/p71yay213xkb1.jpg?width=1788&format=pjpg&auto=webp&s=2f935e3212d0c7113fba2575f339f95b5bada632 https://preview.redd.it/f7yk47413xkb1.jpg?width=1769&format=pjpg&auto=webp&s=075bab02a70ec32227e1bad493052d03043376ee https://preview.redd.it/i06wq0313xkb1.jpg?width=1590&format=pjpg&auto=webp&s=6212bb9078b8c48cd28ca45898f79b44d45ae3c3 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] MMLU having many questions with wrong answers?
    AI Explained Youtube channel did a video where they went through self reflection, but doing that they found a fairly large number of questions that either missed context, where miss spelled or just had wrong answers in the MMLU dataset. (video: https://www.youtube.com/watch?v=hVade_8H8mE) It would not matter so much if the models had high failure rate, but as the models are getting closer and closer to 100%, the wrong answers will matter more and more. So, what can be done to fix such errors or to create a better test than MMLU? submitted by /u/Luvirin_Weby [link] [comments]  ( 9 min )
    [R] OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models - OpenGVLab, Shanghai AI Laboratory 2023 - Provides an pre-trained Omniquant model zoo for multiple model families, including LLaMa-1&2, LLaMa-2-Chat, OPT!
    Paper: https://arxiv.org/abs/2308.13137 Github: https://github.com/OpenGVLab/OmniQuant HuggingFace Model direct download: https://huggingface.co/ChenMnZ/OmniQuant/tree/main Abstract: Large language models (LLMs) have revolutionized natural language processing tasks. However, their practical deployment is hindered by their immense memory and computation requirements. Although recent post-training quantization (PTQ) methods are effective in reducing memory footprint and improving the computational efficiency of LLM, they hand-craft quantization parameters, which leads to low performance and fails to deal with extremely low-bit quantization. To tackle this issue, we introduce an Omnidirectionally calibrated Quantization (OmniQuant) technique for LLMs, which achieves good performance in …  ( 9 min )
    [D] How long can it take to learn machine learning from scratch well enough to be hireable?
    Hello everyone. I am starting my career transition, and would be interested to know how long it might take me to acquire the skills needed to work for a company. Likewise, I would like to know if it is necessary (or important) to have a professional degree to get a job. Just to give you some context about me, I am currently a recently graduated lawyer, so my degree has not given me a strong mathematical background. However, my strongest area of learning has always been mathematics, so despite not having a very advanced background, I consider myself to be a pretty good and fairly quick learner. I would also like to know if you consider if my professional career could be useful in some machine learning context. If you could recommend me some courses, inputs or guide to study in an organized way on the subject I would be very grateful. Thank you very much in advance. submitted by /u/Davidescudero10 [link] [comments]  ( 9 min )
    [R] DeepMind Researchers Introduce ReST: A Simple Algorithm for Aligning LLMs with Human Preferences
    Large language models (LLMs) are amazing at generating fluent text and solving various linguistic tasks. However, these models are not always aligned with human preferences and values and may produce harmful or undesirable content if not properly guided. Aligning LLMs with human preferences can also improve their performance on downstream tasks. One way to achieve this alignment is to use reinforcement learning from feedback (RLHF), which learns a reward model from human input and then fine-tunes the LLM using a reinforcement learning (RL) objective. However, RLHF methods often face challenges such as computational cost, reward hacking, and data quality. To address these issues, researchers from DeepMind propose a new method called Reinforced Self-Training (ReST), which is inspired by gro…  ( 10 min )
    [Discussion] Starting a ML/DL hobby project - need advice
    Hello everyone, I am at a bit of a crossroads and hope for some advice from the community. I also hope the answers would help others who are in my situation right now. I used to work on vision-related problems utilizing Deep learning back in a day, with all fun included: trying out new models, implementing data pipelines, evaluating various metrics... It was a rather big company with its own data collection efforts and enough resources for training. So, I am definitely not a beginner and have some experience. At my current job, I am not doing any ML/DL at the momemt, and while the stuff I am doing is still cool and I enjoy it, I am missing good old ML and having a feeling that I am hanging behind as the time goes by. So I figured it would be nice to start a hobby project, preferably in the area of vision-related applications of deep learning. However, I feel a bit lost as in what would be the most efficient approach taking into account I would only have a coule of hours per week for it. Here are possible ways to go I am thinking of: take a paper, implement it from scratch with PyTorch clone an existing project, contribute with code improvements/better test coverage take an existing pre-trained model, adapt to a slightly different task and fine-tune While the first option is of cource the most exciting, the problem is you have to pay for a powerful GPU and data storage which might be impractical (my PC has a 4 GB GTX 1650 TI). Cloud storages exist, and I would be willing to even spend something on training but would like to avoid the costs. So, the question would be: has enyone faced similar situation? Which way did you end up going? Any general tips? Thanks! submitted by /u/odu_1 [link] [comments]  ( 10 min )
    [D] Multimodality: Applications, Use-cases, & Top Tools
    Hi folks, As multimodality is increasing in popularity, many data domains seem to be "converging" lately, e.g. text & image domains. What are some of the best tools, use-cases, and methods out there you've seen for practical multimodality applications (e.g., below is an example of multimodal search from our latest blog post). https://i.redd.it/z58w6v2r9wkb1.gif submitted by /u/kazhdan_d [link] [comments]  ( 9 min )
    [D] Question On Derived/Synthetic Input Tokens for LLMs
    I'm likely using the wrong vocabulary here (and thus struggling to find info on my own) but I was curious if there were any work done on "synthetic" inputs for LLMs. In essence, rather than input embeddings all coming from a fixed token vocabulary, could you instead input an embedding as a token that was generated elsewhere? An output of another LLM (embedding model) or any other way (maybe just an average of a few tokens as an example)? Essentially - I am curious if there's a NLP approach analogy to Textual Inversion techniques in image generation models. I could imagine this being useful for things like RAG or personalization (if you could have a "user" token). Surely I'm not the first to think of this so I would love some pointers to any papers/blogs etc in this space. submitted by /u/GeneralMalarkee [link] [comments]  ( 9 min )
    [D] Why do you integrate ML features into your product?
    Hi everyone, I’ve heard countless times people saying “I want to integrate ML in my product” and recently “I love ChatGPT, I should integrate it in my product”. Yet, as I dived deeper, seeking the genuine reasons and pain points driving this request, I regularly found the same pattern: many had no clear motive for their AI aspirations. It seemed as if they were only jumping on the trend because “everyone else is doing it”, or because their “CEO” told them to do so. So my question is : why do you integrate AI/ML into your products? Is it to enhance your user experience? Is to automate repetitive and time-consuming tasks? Is it to stay ahead of your competition? or is it just because everyone is doing it? submitted by /u/Vivid_Recording582 [link] [comments]  ( 9 min )
    "[P]" The Consilience Equation: Bridging Holism and Reductionism in Machine Learning and Biomimicry
    Hey everyone! I've been working on and playing around with novel and adaptable model architectures and landed on something really cool. It's based on a Biomimicry principle and has some really cool features. I've tested it using various pre-loaded library datasets like CIFAR and MINST, as well as adapting it to a few Kaggle competitions. It has achieved some pretty amazing results by using it's unique adaptability; which comes down to figuring out how the Holistic and Reductionist model architectures can best utilize their roles and how they can combine dynamically. I'm currently compiling the full official open source paper and release with usable Notebooks, but I didn't want to sit on it that long without sharing it with the community. Here is a link to a very haphazardly-thrown-togethe…  ( 11 min )
    Machine Learning Courses [D]
    Hi. Recently I finished my Computer Science bachelors degree, while I learnt some machine learning in some courses I felt it was not too advanced. Now that I have some time I wanted to take some online courses with Certifications on Machine Learning, I wanted to know if anyone has any recomendations for some Machine Learning Courses (with certifications if possible) on coursera or udemy or similar. The one I'm most inclined now is: https://www.coursera.org/professional-certificates/ibm-machine-learning. Or maybe: https://www.coursera.org/specializations/machine-learning-introduction submitted by /u/Radoco152 [link] [comments]  ( 9 min )
    [P] Danswer: NLP based project to automatically answer Slack questions
    Slack questions are a huge time sink. For the person asking, they generally have no idea how to find the info and may not hear back for hours. For the person answering, it’s a distraction and often requires digging up old knowledge. The idea is simple: give an LLM your organizational context and plop it in Slack to answer things for you. DanswerBot is free to use and open source (MIT). You can connect it to Slack, Google Drive, GitHub, Confluence, Jira, local files, websites, and much more. Quick Demo Vid: https://youtu.be/EjDDvt5GbS8 Some additional neat features you may be interested in: LLM generated answers backed by quotes to reduce hallucination Supports a wide range of LLMs (both open source and proprietary) Multi-Vector embeddings for accurate vector search BM-25 Keyword search Learning from user feedback Custom NLP model to classify user intent Polls your data sources every 10 minutes to keep knowledge up to date Links back to your document sources Document level access control Admin dashboard to configure connectors to 14 (for now) of the most popular workplace tools If you aren’t a slack user (or if you just prefer a more tailored UI), there’s also a web interface to ask questions against your knowledge base. A short demo for that can be found at: https://youtu.be/cWWtnuVCUX0 If you’re interested in testing this out yourself, the docs to help you launch Danswer with a single command can be found at https://docs.danswer.dev/quickstart! submitted by /u/Weves11 [link] [comments]  ( 9 min )
    [D] Open problems in latent space/intrinsic variables
    I'm finishing my degree in Computer Science, and I need a good topic, does anyone know any open problems about latent space optimization, or finding the intrinsic variables of a system? submitted by /u/QLaHPD [link] [comments]  ( 9 min )
    [R] Quantum-Noise-driven Generative Diffusion Models
    https://arxiv.org/abs/2308.12013 Generative models realized with machine learning techniques are powerful tools to infer complex and unknown data distributions from a finite number of training samples in order to produce new synthetic data. Diffusion models are an emerging framework that have recently overcome the performance of the generative adversarial networks in creating synthetic text and high-quality images. Here, we propose and discuss the quantum generalization of diffusion models, i.e., three quantum-noise-driven generative diffusion models that could be experimentally tested on real quantum systems. The idea is to harness unique quantum features, in particular the non-trivial interplay among coherence, entanglement and noise that the currently available noisy quantum processors do unavoidably suffer from, in order to overcome the main computational burdens of classical diffusion models during inference. Hence, we suggest to exploit quantum noise not as an issue to be detected and solved but instead as a very remarkably beneficial key ingredient to generate much more complex probability distributions that would be difficult or even impossible to express classically, and from which a quantum processor might sample more efficiently than a classical one. Therefore, our results are expected to pave the way for new quantum-inspired or quantum-based generative diffusion algorithms addressing more powerfully classical tasks as data generation/prediction with widespread real-world applications ranging from climate forecasting to neuroscience, from traffic flow analysis to financial forecasting. ​ submitted by /u/ghosthamlet [link] [comments]  ( 9 min )
    [D] Looking for suggestions on where to sell a couple ML servers EU
    So I have been tasked with finding a buyer for a couple high end machine learning servers. They were owned by my wife’s father who passed recently. The servers are powered by a couple Epyc 7003s and have A series gpus. We have invoices for them and VAT has been paid on everything. Basically, I’m looking for legit communities where I can find potential buyers preferably in the EU. Hopefully it’s ok to post this here. Also feel free to PM . submitted by /u/Obnomad [link] [comments]  ( 9 min )
    Change of degree from Econ [D]
    Hi everyone, I’m currently doing my undergrad in Economics but am heavily interested in Compsci/Datasci and related topics. Though to be completely honest, I’m not completely sure which area my interests lie in. I was wondering if picking up coding/ theoretical knowledge that a com scientist or data scientist needs will be hard when I am already working. The question is if it is necessary to switch my degree to Math and Economics to gain a firmer foundation in the mathematical/ statistical concepts that ground com science. Or will an undergrad in Economics be sufficiently rigorous for me to pick up com sci/ data sci myself. For context, I’m thinking of taking courses on Real Analysis, Linear Algebra 2, Discrete Mathematics, Algorithms and Data Structures, Optimisation, Probability and Statistics. submitted by /u/smexy32123 [link] [comments]  ( 9 min )
    [D] Google Gemini Eats The World – Gemini Smashes GPT-4 By 5X, The GPU-Poors
    submitted by /u/hardmaru [link] [comments]  ( 9 min )
    Deci Introduces DeciCoder: An Open-Source 1B-Parameter Large Language Model For Code Generation [N]
    Deci has introduced DeciCoder, an open-source 1B-parameter large language model for code generation. This new model addresses the challenge of efficient code generation in the fast-paced world of AI, while also addressing concerns about energy consumption and operational costs. https://preview.redd.it/fpwnclb2fskb1.png?width=1680&format=png&auto=webp&s=a58e9b16902070c3f5a8efcf1cc24422852a4c35 Why this matters: DeciCoder is a transformative solution: It leverages cutting-edge architecture and AutoNAC™, a proprietary Neural Architecture Search technology, to generate optimal architectures. This results in an impressive architecture optimized for NVIDIA’s A10 GPU, which boosts throughput and rivals the accuracy of existing code generation models. DeciCoder is efficient and sustainable: …  ( 10 min )
    [R] Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models - Microsoft 2023 - Far less queries with the same accuracy as Tree of Thought!
    Paper: https://arxiv.org/abs/2308.10379 Abstract: Current literature, aiming to surpass the "Chain-of-Thought" approach, often resorts to an external modus operandi involving halting, modifying, and then resuming the generation process to boost Large Language Models' (LLMs) reasoning capacities. This mode escalates the number of query requests, leading to increased costs, memory, and computational overheads. Addressing this, we propose the Algorithm of Thoughts -- a novel strategy that propels LLMs through algorithmic reasoning pathways, pioneering a new mode of in-context learning. By employing algorithmic examples, we exploit the innate recurrence dynamics of LLMs, expanding their idea exploration with merely one or a few queries. Our technique outperforms earlier single-query methods and stands on par with a recent multi-query strategy that employs an extensive tree search algorithm. Intriguingly, our results suggest that instructing an LLM using an algorithm can lead to performance surpassing that of the algorithm itself, hinting at LLM's inherent ability to weave its intuition into optimized searches. We probe into the underpinnings of our method's efficacy and its nuances in application. https://preview.redd.it/bc7l7gex2rkb1.jpg?width=1529&format=pjpg&auto=webp&s=4ed0dc528e998eeeab80fd4d9612d761065d7627 https://preview.redd.it/wejr7lfx2rkb1.jpg?width=920&format=pjpg&auto=webp&s=386febcb60ff1db04b12e9e44856770d41bb9530 https://preview.redd.it/gec0phex2rkb1.jpg?width=1241&format=pjpg&auto=webp&s=03096946aa65deee392c5f59b07fe340244ec0cd ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
  • Open

    University of San Francisco Data Science Conference 2023 Datathon in partnership with AWS and Amazon SageMaker Studio Lab
    As part of the 2023 Data Science Conference (DSCO 23), AWS partnered with the Data Institute at the University of San Francisco (USF) to conduct a datathon. Participants, both high school and undergraduate students, competed on a data science project that focused on air quality and sustainability. The Data Institute at the USF aims to support cross-disciplinary research and education in the field of data science. The Data Institute and the Data Science Conference provide a distinctive fusion of cutting-edge academic research and the entrepreneurial culture of the technology industry in the San Francisco Bay Area.  ( 5 min )
  • Open

    RO-ViT: Region-aware pre-training for open-vocabulary object detection with vision transformers
    Posted by Dahun Kim and Weicheng Kuo, Research Scientists, Google The ability to detect objects in the visual world is crucial for computer vision and machine intelligence, enabling applications like adaptive autonomous agents and versatile shopping systems. However, modern object detectors are limited by the manual annotations of their training data, resulting in a vocabulary size significantly smaller than the vast array of objects encountered in reality. To overcome this, the open-vocabulary detection task (OVD) has emerged, utilizing image-text pairs for training and incorporating new category names at test time by associating them with the image content. By treating categories as text embeddings, open-vocabulary detectors can predict a wide range of unseen objects. Various techniqu…  ( 93 min )
    RO-ViT: Region-aware pre-training for open-vocabulary object detection with vision transformers
    Posted by Dahun Kim and Weicheng Kuo, Research Scientists, Google The ability to detect objects in the visual world is crucial for computer vision and machine intelligence, enabling applications like adaptive autonomous agents and versatile shopping systems. However, modern object detectors are limited by the manual annotations of their training data, resulting in a vocabulary size significantly smaller than the vast array of objects encountered in reality. To overcome this, the open-vocabulary detection task (OVD) has emerged, utilizing image-text pairs for training and incorporating new category names at test time by associating them with the image content. By treating categories as text embeddings, open-vocabulary detectors can predict a wide range of unseen objects. Various techniqu…  ( 93 min )
  • Open

    RL with Constraints, High Dimensional State Space
    I have an environment where there are multiple agents being represented by one neural network (so the policy outputs all of their actions). These actions as time goes on should not exceed a certain constraint level or they will put the environment into an undesired an irrecoverable state. I am wondering what the best way to inform these agents of this cumulative action constraint? I have appended it to my state vector but since the observation without this cumulative action is still a 625*1 vector, I think adding that constraint as just one additional state is causing it to be drowned out by the state size. Any ideas of how to addreess? submitted by /u/Feisty_Relation_2359 [link] [comments]  ( 9 min )
    Nash equilibrium in Multi agent RL
    I have a multi agent competitive RL problem which I solved. Now, I want to show that all agent’s policies are at a nash equilibrium of the problem. How can I do this? Also, some things must be considered. First, I can’t mathematically model the environment so some how I have to numerically show that they reached nash eq. Another thing that I find is deviate the action of each agent and show that they don’t get a better reward but the problem is there is a actor network for each agent. How can I show deviation from the optimal policy? submitted by /u/Brief-Emotion6291 [link] [comments]  ( 9 min )
    Machine Learning / Twitter (X) Community
    submitted by /u/x9182 [link] [comments]  ( 9 min )
    Need Help Designing A2C Agent with Monotonic Bidding Curve Constraints
    I'm attempting to train an agent using A2C, where the agent generates a vector as its action at each time step. This vector represents a bidding curve, and a crucial property is that it must always increase monotonically. Otherwise, the bid is considered invalid. For example, [0, 1.2, 4.5, 58, 92.65, 104.3, 104.3] is valid because each number is greater than or equal to the previous one. I'm looking for guidance on how to design this setup, impose these constraints, and handle cases where the agent violates the sequence. While using negative rewards might not be effective due to the potential for generating numerous invalid bids, I'm unsure about the right approach. Could someone assist me with this? submitted by /u/uonliaquat [link] [comments]  ( 9 min )
  • Open

    Saving Green: Accelerated Analytics Cuts Costs and Carbon
    Companies are discovering how accelerated computing can boost their bottom lines while making a positive impact on the planet. The NVIDIA RAPIDS Accelerator for Apache Spark, software that speeds data analytics, not only raises performance and lowers costs, it increases energy efficiency, too. That means it can help companies meet goals for net-zero emissions of Read article >  ( 6 min )
  • Open

    AI vs a giraffe with no spots
    On July 31, 2023, a giraffe with no spots was born at Brights Zoo in Tennessee. She's a uniform brown with pretty white highlights around her face and belly, like a Jersey cow or a white-tailed deer. Image recognition algorithms are trained on a variety of images from  ( 5 min )
    Attempts to generate a spotless giraffe
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    Machine Learning / Twitter (X) Community
    submitted by /u/x9182 [link] [comments]  ( 9 min )
  • Open

    Empowering cyber guardians: How AI is changing the landscape of protection
    In the ever-evolving battle against the digital dark forces, the defenders of the virtual realm find themselves facing a barrage of ever-advancing threats. From the labyrinthine corridors of the Deep Web to the stealthy maneuvers of nation-state actors, the cyber landscape is as treacherous as it is vast. As our dependency on digital infrastructure deepens,… Read More »Empowering cyber guardians: How AI is changing the landscape of protection The post Empowering cyber guardians: How AI is changing the landscape of protection appeared first on Data Science Central.  ( 21 min )

  • Open

    [P] GPT4 Contextual Decomposition Template
    Complex tasks with LLMs like ChatGPT/GPT4 are best broken down by first asking ChatGPT to outline the steps and then asking the LLM to execute against those steps that it defined. I first came across this interesting technique on Twitter recently. While it’s OK to do this once in OpenAI’s playground, it's difficult to make this repeatable and streamlined. When I wanted an LLM to do something complex, I wanted to be able to plug into a template instead of thinking about and setting up the contextual decomposition process. I made this Contextual Decomposition Template to help solve this problem: https://lastmileai.dev/workbooks/cllqfl5c600rdpgnhh2su2fa0 With a document and objective, this template allows you to quickly get to the answer through defining intermediate steps and executing according. Parameters are set up so you can easily change the goal, document, and objective and click 'Run All' to get the final results. Please let me know if you have feedback! I'm also very curious if you have other interesting techniques with complex tasks and workflows working with LLMs. submitted by /u/InevitableSky2801 [link] [comments]  ( 9 min )
    [D] Questioning the Nature of AI
    submitted by /u/SensitiveAd6425 [link] [comments]  ( 9 min )
    [D] How can I benchmark my PC/GPU and compare it to others online, sort of like 3DMark?
    I have a RTX 2070 GPU and I'm wondering if there's any benchmarking tool where I can also see where others stand compared to the specs of my machine. submitted by /u/Al_Miksiki [link] [comments]  ( 9 min )
    Experience with pain detection approaches [P]
    ​ https://preview.redd.it/6t50ye377qkb1.png?width=1186&format=png&auto=webp&s=6def3f6ffdac50dc81d58b6f754366bf88570044 submitted by /u/adamjbradley [link] [comments]  ( 9 min )
    PUMA: A framework for secure and efficient evaluation of Transformer models [R]
    Concerns surrounding data privacy and security in AI have shifted to the limelight with the arrival of Large Language Models (LLMs). Despite the popularity of models like ChatGPT, potential drawbacks pose worries. Now, a new framework named PUMA promises to address these crucial concerns with an unprecedented touch of precision and efficiency. Can't keep track of this rapidly progressing tech world? Subscribe here to stay informed. https://preview.redd.it/tyr2mz3d4qkb1.png?width=1600&format=png&auto=webp&s=d8d771da5bbfa5cd53ab2823c5d7dad6f369109d What makes PUMA special? An ingenious approach: PUMA merges secure multi-party computation (MPC) with efficient inference, bridging the capabilities of Transformer models and security concerns. Redefining LLMs with three entities: the model…  ( 10 min )
    [D] I need to vectorize 100tb+ of data, multiple GPU's per machine or multiple machines?
    TLDR: Is it ok to use two 4070ti's in a machine if all you need is more cuda cores to create embeddings and don't care about memory capacity, i.e. not for LLM's Background I have 20tb of text data (size in mongo) and 80tb of images (stored at 800x600-800) on my homelab on ssd's which i'm in the process of vectorizing and creating embeddings for. I have a 3090 with two python scripts, each script does the same thing, fetches a batch of records from mongo, grabs the image from the ssd, downsizes the image, creates embeddings, then uploads to qdrant (vector search engine) in a batch. ​ Current setup Ryzen 9 7950x, 64gb ddr5, rtx 3090 -this is the one creating the embeddings currently. 1st gen 32 core epyc with 512gb ddr4 and ~200tb of ssd storage - holds all the data and databases and…  ( 11 min )
    [D] K-Means from scratch | Learning ML
    Hello everyone. I started to study some Machine Learning algorithms, specifically K-means, but I'm not sure if I did it correctly for several reasons: - In the Kmeans that I did, I normalize the data because they mention that it helps a lot, but if I don't, the algorithm stops classifying normally and shows me badly grouped points. - As I mentioned, when looking at the graphs of the grouped points, I can see how many of the points are clearly closer to certain centroids, but he classified them as others, this reaches the level of a misclassified point next to the centroid when that should belong - Despite the fact that it has a threshold to be making iterations, the algorithm ends in less than 10 even though it has placed 100 iterations. I know that it can depend on the dataset and the generated centroids, but it seems excessive to me that it ends so soon and with results like Iris datset (60, 13, 77) when it should be (50, 50, 50) or a minimum to be maintained for those values. I leave the code in GH in case someone can help me: https://github.com/vanstrouble/kmeans-from-scratch.git submitted by /u/vanstrouble [link] [comments]  ( 9 min )
    Poker Playing Robot [D]
    Hello, So for a project we wanted to create a robot that can play poker. This robot will first only be used on software but eventually we are hoping to add hardware. We want to be able to make two bots and put them agansit each other so they learn by machine learning. Once we find that they are skilled and understand we would like to be able to actually play them. I have heard of similiar projects to this online and on reddit. If anyone has any information about how to go about this or ideas, or just anything please let me know. I would love to have help on this project. submitted by /u/Jake1900ooo [link] [comments]  ( 9 min )
    [P] DLAS Dataset
    submitted by /u/Why_is202 [link] [comments]  ( 9 min )
    [D] How to structure/manage a machine learning experiment? (medical imaging)
    I'm in the strange position of having the task of developing a machine learning pipeline/system/process in an academic environment without the benefit of much in the way of formal training in ML (I'm more of a classical stats for hypothesis testing kinda guy). The particular project is using machine learning on medical images (head CT scans) to detect a relatively rare condition. As usual the goal is to eventually have some automatic process for diagnosis support. This particular condition is something that diagnostic radiologists can always detect if they look in the right place on the image, the problem is that they often don't look in the right place. After talking to colleagues with more experience (but less time) it's something which in principle can be achieved with more or less "off the shelf" code put together in the right order and with appropriate hyperparameters. This stage of the project is aiming for a proof of principle, rather than anything deployable. We're lucky to have a decent amount of data inside a trusted research environment. I've done some hobby-level stuff and tutorials, but overall I'm coming into this with a lot more experience with medical imaging than with computer vision or machine learning. After all that preamble here's my question: What does a decent CV/ML experiment look like? Left to my own direction I can see myself picking 3 different approaches of varying complexity, trying to get the best out of each of them, and then presenting a comparison of performance or accuracy of all of them. I then claim the "best one" as the one we move on with. There are a lot of tools out there for experiment tracking (eg neptune.ai), but I'm really not sure whether that sort of thing is over the top for what I need to do. Any tips or experience that you folks don't mind sharing? submitted by /u/PrivateFrank [link] [comments]  ( 10 min )
    [D] Limit the Number of Papers I Review on OpenReview?
    Hello, Does anyone know if it's possible to set a limit to the number of papers you are assigned as a reviewer on OpenReview? Specifically for ICLR 2024. I saw a Twitter thread about this option before for ICML. It blows my mind that this is not easy to change. I got 5 papers for the last NeurIps which was very overwhelming. As reviewers, we provide a free service to the community, and we should be allowed to pick how much work we want to undertake... submitted by /u/cringe_reddit_user69 [link] [comments]  ( 9 min )
    How can I change the orientation of a frame mockup using AI? [P]
    Hi all. I'm hoping someone out there can help me solve this. TLDR: How do I change the orientation of portrait frames to landscape frames while keeping the mockup essence the same. Link: https://ibb.co/album/hx6wp3 Basically, I have two portrait frame mockups that came in a bundle and the bundle had no landscape frame mockups at all. So, naturally I'd like to make my own since I have a lot of landscape artworks that could be displayed in the mockups. How can I change the orientation display of my mockup? I've tried using Photoshop's generative AI software and got nowhere. It keeps giving me a new frame design when I want to keep the original frame so it matches the set. Any leads on how this can be done would be appreciated. submitted by /u/Ambilina [link] [comments]  ( 9 min )
    [D] Product search using LLM
    Hey!One of my friends brought up an idea about using LLM for product search and we started talking about the idea and approach. Per my understanding what would need to be done is to train some smaller language model on the product data, create embeddings from the product info and make the model use this as a body of knowledge. My issue is that if this was ever to be done on commercial scale it seems very complex to me, since the embeddings would have to be re-created every time a new product is introduced? Let me know what you think or how you would approach this, as I'm trying to see different PoV's and everyone here has more experience than me. ​ Thanks! submitted by /u/LukaAda [link] [comments]  ( 9 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [R] New preprint on detecting errors in LLM prompt response
    We just released as study where we show that a "diversity measure" (e.g., entropy, Gini, etc.) can be used as a proxy for probability of failure in the response of an LLM prompt; we also show how this can be used to improve prompting as well as for prediction of errors. We found this to hold across three datasets and five temperature settings, tests conducted on ChatGPT. Preprint: https://arxiv.org/abs/2308.11189 Source code: https://github.com/lab-v2/diversity_measures Video: https://www.youtube.com/watch?v=BekDOLm6qBI&t=10s ​ Example result showing correlation of entropy with failure probaiblity submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    [D] Do papers like this "disprove" the stochastic parrot theory? Pretty strong evidence that LLMs can build an internal world model, at least for simple board games.
    https://arxiv.org/abs/2210.13382 submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [P] give me ideas on visualization.
    I have written AI model to predict NHL games and now working on visualization. No tech talk, just visual, assume I gather all possible data. I would like to make it a prediction dashboard and not sport dashboard so simple stats are not recommended. Data on the image is made up, don't bother. I am using matplotlib + seaborne (Python) submitted by /u/Fifa_ToNieMiami [link] [comments]  ( 9 min )
    [P] DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
    submitted by /u/ghosthamlet [link] [comments]  ( 9 min )
    [R] Challenges and Applications of Large Language Models - University College London 2023 - 72 Pages!
    Paper: https://arxiv.org/abs/2307.10169 Abstract: Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current state more quickly and become productive. https://preview.redd.it/sng6uk7tcmkb1.jpg?width=657&format=pjpg&auto=webp&s=2ed693a88097cc8cbcd72ecd8c0d36820629625d https://preview.redd.it/wslkgm7tcmkb1.jpg?width=478&format=pjpg&auto=webp&s=9908f28717c8bd98d48d4559ccc2db9cc3796bee https://preview.redd.it/12q01l7tcmkb1.jpg?width=471&format=pjpg&auto=webp&s=1ca1eb54f679cf8a12f10aaf790d607db7bb363c ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [Project] UForm-v2: tiny CLIP-like embeddings in 21 languages with extreme performance
    Vision-Language understanding Transformer, which has 40% fewer parameters than vanilla CLIP while performing much better on text-to-image retrieval, where it's also beneficial that output embeddings have 2x fewer dimensions (256 vs 512). Moreover, it supports 21 languages, including popular English, Hindi, Chinese, Arabic, and lower-resource languages like Ukrainian, Hebrew, and Armenian. Demo: http://usearch-images.com/ Github: https://github.com/unum-cloud/uform https://i.redd.it/6133eyj73mkb1.gif submitted by /u/vov_or [link] [comments]  ( 9 min )
    [D] How is a language model applied on Speech-to-text models such as Wav2Vec 2.0 ?
    I'm new to speech processing. As I read the paper on wav2vec 2.0, I see them mentioning the use of language models in decoding, particularly a 4-gram model and a Transformer. As far as I'm aware, the encoder (wav2vec2) will output a probability sequence of L x V (where V is the vocab size, L is sequence length). I have two questions: I learned that a n-gram language model would predict the probability of a n-gram given previous context words, but how is a Transformer implemented here ? Does it follow a causal structure such as GPT and then estimate sequence likelihood ? How can a language model, trained to estimate next word (n-gram) probability given previous context, be used to decode the output sequence given the L x V probability outputs from above ? Many thanks ! submitted by /u/KarmaCut132 [link] [comments]  ( 9 min )
    [R] DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting
    We introduce a new operator, called 3D DeFormable Attention (DFA3D), for 2D-to-3D feature lifting, which transforms multi-view 2D image features into a unified 3D space for 3D object detection. ​ Comparisons of feature lifting methods. Existing feature lifting approaches, such as Lift-Splat-based and 2D attention-based, either use estimated depth to get pseudo LiDAR features and then splat them to a 3D space, which is a one-pass operation without feature refinement, or ignore depth and lift features by 2D attention mechanisms, which achieve finer semantics while suffering from a depth ambiguity problem. In contrast, our DFA3D-based method first leverages the estimated depth to expand each view's 2D feature map to 3D and then utilizes DFA3D to aggregate features from the expanded 3D feature maps. With the help of DFA3D, the depth ambiguity problem can be effectively alleviated from the root, and the lifted features can be progressively refined layer by layer, thanks to the Transformer-like architecture. In addition, we propose a mathematically equivalent implementation of DFA3D which can significantly improve its memory efficiency and computational speed. We integrate DFA3D into several methods that use 2D attention-based feature lifting with only a few modifications in code and evaluate on the nuScenes dataset. The experiment results show a consistent improvement of +1.41\% mAP on average, and up to +15.1\% mAP improvement when high-quality depth information is available, demonstrating the superiority, applicability, and huge potential of DFA3D. 🔥 Code: https://github.com/IDEA-Research/3D-deformable-attention.git 🔥 Paper: https://arxiv.org/abs/2307.12972 submitted by /u/HYeung_Lee [link] [comments]  ( 9 min )
    Shanghai AI Lab and NTU Unveil MATLABER: A Pioneer in Text-To-3D Creation [R]
    AI researchers from Shanghai AI Laboratory and Nanyang Technological University are breaking new ground with their creation of MATLABER, an innovative text-to-3D pipeline. If you want to stay ahead of the curve in AI and tech, look here first. https://preview.redd.it/8walduw7dkkb1.png?width=806&format=png&auto=webp&s=4908181a408d990ed224a503a63d78d204e460be Why this matters: Text-to-3D pipelines are a hot topic in AI Change: The ability to create 3D assets from textual descriptions can revolutionize the industry, reducing time, labor, and skill requirements. MATLABER conquers a longstanding issue: Overcoming the challenge of restoring high-fidelity object materials in text-to-3D pipelines, MATLABER expands the applicability of these technologies in real-world scenarios. Material-aw…  ( 9 min )
    [R] new diffusion model for music generation
    submitted by /u/jmoso13 [link] [comments]  ( 9 min )
  • Open

    Robotics and Artificial Intelligence: Pioneering a Longer, Healthier Life
    How large an impact do you think AI and robotics will have on healthcare, overall quality of life, and extending lifespans? The following article seeks to explore when we might possibly see AI & robotics fully integrated within society. https://www.catchingimmortality.com/technology-for-the-future/robotics-and-artificial-intelligence-pioneering-a-longer-healthier-life ​ submitted by /u/catchingimmortality [link] [comments]  ( 9 min )
    Will AI TV Shows Ever Be A Thing? (via prompt)
    Do you think there will ever be a time where, with a prompt, you could see entire TV Shows or an entire episode? ​ For example wanting to see what could of happened if alternate stuff happened in Dragon Ball Z, Or Breaking Bad if xyz. Of course there'd be a lot of uprising against it, but, do you think the time will ever come where this will be possible? submitted by /u/Different_Effective3 [link] [comments]  ( 9 min )
    Text to artful animation?
    I would like to be able to input phrases such as "artistic line drawings of birds flying through a blue sky spotted with clouds" or "colorful balloons moving around in slow motion like a 90's screen saver" or "time lapse of the moon moving across the starry night sky" etc. I want the AI to create minimalist, short (maybe 5 mins) animations from these sort of inputs. Can anyone point me in the right direction? submitted by /u/petworthy [link] [comments]  ( 9 min )
    How artificial intelligence sharpens blurry thermal Night Vision images
    submitted by /u/cranberryfix [link] [comments]  ( 9 min )
    AI and labor market/work life
    Hey peeps! I try to keep up with what's happening with the labor market and working life and how AI affects these areas. I am looking for good sources where you can stay up to date on this! What are some good podcasts, newsletters, books and the like that you should keep an eye on? submitted by /u/emillindstrom [link] [comments]  ( 9 min )
    Where can I find this AI voice?
    Hi all, I've heard this voice used alot recently, where can I find it/use it? Thanks submitted by /u/Fightingdaduk [link] [comments]  ( 9 min )
    How Does GPT-4 Work and How Do I Build Apps With It?
    Understanding GPT-4 What is GPT-4? GPT-4 (Generative Pre-trained Transformer 4) is a machine learning model for natural language understanding and generation. It works by analyzing a large dataset and generating text based on the input it receives. How Does It Work? GPT-4 uses deep neural networks with multiple layers to predict the next word in a sequence of words. The model has been trained on a wide range of internet text, so it's capable of understanding and generating coherent and contextually relevant text based on the prompts it's given. Building Apps with GPT-4 Step 1: Get API Access To use GPT-4, you'll first need access to its API. OpenAI provides this service, and you can apply for an API key from their website. Step 2: Choose Your Programming Language You can integrate the GPT-4 API into your application using various programming languages such as Python, JavaScript, or Ruby. Step 3: Making API Calls Once you've chosen your language, you'll make RESTful API calls to communicate with GPT-4. You'll pass your prompt as an input and receive generated text as output. Example in Python Here is a simple Python example using the openai library to interact with GPT-4: ```python import openai openai.api_key = "your-api-key-here" response = openai.Completion.create( engine="text-davinci-002", prompt="Translate the following English text to French: '{}'", max_tokens=60 ) print(response.choices[0].text.strip()) ``` Step 4: Handle Rate Limits OpenAI's API comes with rate limits, so you'll need to manage these by either queuing requests or handling retries. Step 5: Deployment After testing and fine-tuning, deploy your application. Ensure that you are abiding by OpenAI's usage policies and guidelines. Conclusion GPT-4 is a powerful tool for natural language understanding and generation. By understanding its workings and following the steps to integrate it into an application, you can leverage its capabilities for various use-cases. submitted by /u/nicdunz [link] [comments]  ( 10 min )
    10 Facts about Quantum Computing and AI That You Probably Didn’t Know
    Quantum computing can solve problems in seconds that would take classical computers millions of years. AI algorithms can be used to optimize quantum circuit design. Google's "quantum supremacy" claimed to perform a task in 200 seconds that would take classical supercomputers 10,000 years. Quantum Machine Learning algorithms could potentially revolutionize AI by enabling faster training and better optimization. Quantum error correction is a big challenge, as quantum bits (qubits) are highly susceptible to errors. AI can help in auto-correcting such errors in quantum computations. Quantum annealing, a specialized form of quantum computing, is being used for optimization problems in machine learning. Quantum computing's "quantum entanglement" can enable much more efficient parallel processing. AI-based quantum simulators can model complex quantum systems that are impossible to study otherwise. Quantum encryption, backed by the principles of quantum mechanics, can enhance AI security. submitted by /u/nicdunz [link] [comments]  ( 9 min )
  • Open

    Curvature at Cairo
    I was flipping through Gravitation [1] this weekend and was curious about an illustration on page 309. This post reproduces that graph. The graph is centered at Cairo, Egypt and includes triangles whose side lengths are the distances between cities. The triangles are calculated using only distances, not by measuring angles per se. The geometry […] Curvature at Cairo first appeared on John D. Cook.  ( 6 min )
    Calculating the intersection of two circles
    Given the equations for two circles, how can you tell whether they intersect? And if they do intersect, how do you find the point(s) of intersection? MathWorld gives a derivation, but I’d like to add the derivation there in two ways. First, I’d like to be more explicit about the number of solutions. Second, I’d […] Calculating the intersection of two circles first appeared on John D. Cook.  ( 6 min )
  • Open

    PMET: Precise Model Editing in a Transformer
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Diversity Measures: Domain-Independent Proxies for Failure in Language M...
    submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    Neural-Network transliteration of the Codex Seraphinianus
    submitted by /u/Marc_Op [link] [comments]  ( 9 min )
  • Open

    Python library for modular RL components
    After a year of struggling with RLlib I decided to start implementing the training code myself. I am looking for a RL library that offers me individual components rather than the whole algorithm. I do not need a PPO implementation, but I would fancy a library that offers me functions to compute the PPO loss given a batch of steps. In other words, what I need is a library that offers the most granular RL components (different losses, replay buffers, return estimators like GAE, etc) instead of full algorithm implementations. Which libraries do you recommend for this purpose? submitted by /u/fedetask [link] [comments]  ( 9 min )
    Choosing best RL library for MuJoCo with envpool
    TL;DR What RL library use in combination with MuJoCo and envpool Hi I want to write program that would find best hyperparameters (number of joint, angles) for design of robots (similar to NAS). It would work in such a way that I would have one RL algorithm that would search for the hyperparameters of the robot and then I would to train and evaluate this robot using SAC in MuJoCo physical simulator. Problem is that MuJoCo runs on CPU and I need lots of parallel enviroments and for this I would use envpool https://github.com/sail-sg/envpool. The question is what (if any) RL library should I use as a wrapper. The options are Stable-Baselines3, Tianshou, ACME, CleanRL, or rl_games. picture of one robot design https://imgur.com/a/5UDdEsE Other than that, do you have any recommendations or notes regarding my project idea? Thanks for response submitted by /u/EFK1500 [link] [comments]  ( 9 min )
    Action selection in Multiple action for continuous state spaces in DDPG
    I have a confusion in action selection in actor of DDPG algorithm. The actor receive state as input and output as deterministic action (generally from tanh function). In the multiple continuous action environment, does the actor perform multiple action simultaneously from the clipped output Tanh [-1,1]? or it has some posterior function converting from Tanh vectors to single deterministic action like Softmax? submitted by /u/AnnonymeowCat [link] [comments]  ( 9 min )
    Mathematics of Best of n Sampling
    Best of n Sampling is a surprisingly simple technique to steer an LM to human preferences much in the same way as Reinforcement Learning algorithms such as RLHF do. Here is the blogpost [0] describing Best of n. [0] https://preview.redd.it/cpi5tj3injkb1.png?width=1670&format=png&auto=webp&s=33eb3f301b515926fd5820ea3c60acd0e1c5ddb1 The blog post claims that one neat property of Best-of-n sampling is that the KL divergence with the initial policy can be computed analytically in closed form. ​ https://preview.redd.it/eij0igaxnjkb1.png?width=1724&format=png&auto=webp&s=73040d49ae55ac651c5fe62b0f4a06b7f8bfd2c5 This turns out to be https://preview.redd.it/gbn7ch10ojkb1.png?width=270&format=png&auto=webp&s=849fe773b31b795c9253fa4cd8172c3120aec745 The blog post provides a hint to express the pdf of BoN in terms of PDF and CDF of the original distribution, but I cannot see how I can express the PDF of BoN in terms of PDF and CDF of the original distribution. Can anyone help me with this? [0] https://openai.com/research/measuring-goodharts-law submitted by /u/ElendirThreadripper [link] [comments]  ( 9 min )

  • Open

    Getting random latents in W+ space [D]
    I'm trying to get roll, pitch, yaw directions in W+ space. Initially, I need like 10k generated images, which I'll get top %5 and bottom %5 for the features I want. I tried to sample from uniform distribution but it fails since W+ is not uniformly distributed. How do I achieve this? submitted by /u/cltexe [link] [comments]  ( 9 min )
    [D] An AI's response to: "Consciousness in Artificial Intelligence: Insights from the Science of Consciousness."
    submitted by /u/ronin_zz123 [link] [comments]  ( 16 min )
    [Discussion] LLMs in business
    Every business on the planet will want to train and feed its own LLM asap to not fall behind. \super computer = tech needed to train a LLM fast on unlimited data* (1) Does a company like McKinsey (confidential data) train their LLM in-house or in the cloud? (2) Do enough super computers exist for every company to start training their LLM today? (3) Is there even a single company that ships super computers capable of training LLMs in-house? (4) McKinsey will want to train their LLMs on all data they have from their customers so that McKinsey can work at max efficiency. Customers won't like that. Is it possible to un-train specific data sets? (5) Would it be possible to feed the LLM with the customer's data instead of training the LLM on the data? What would be the differences? If you feed it the data, then the LLM can't work with the data as well as it could if you trained it on said data? The future is just so damn exciting and I have all these questions popping up so I hope some educated folks can share some insights! Thanks for reading! submitted by /u/MopPanda [link] [comments]  ( 9 min )
    [D] Was trying out Llama2 13B MegaCode2 OASST on my local pc
    https://im3.ezgif.com/tmp/ezgif-3-b05ffc9d5f.gif submitted by /u/theswiftdeveloper [link] [comments]  ( 9 min )
    [D] Comparing Score-Based and Diffusion Models in Theory and Practice
    In theory, it has been demonstrated that score matching models and diffusion models share mathematical similarities. However, in practice, the equivalence between the two approaches may not extend to code implementations. While PyTorch implementations for diffusion models are relatively common, finding equivalent implementations for score-based models can be more challenging. submitted by /u/whysomeonetookmyname [link] [comments]  ( 9 min )
    [D] Industry design patterns for fast-moving ML/DL
    I have been writing ML code (both training and off-the-shelf model inferences) for close to six years now, but mostly in an academic/personal project setting. Now, I find myself spearheading an ML project at a big company, and our backend code base keeps growing, and other people depend on it. There are layers to it, with threads spawning, and dependencies on caches and databases for state sharing. It's more than a pet project - you get the gist. I want to design production-ready architectures that are more robust than piecemeal/make-shift solutions. Do people have resources or suggestions on what established design patterns work in the industry? I have found it hard to find resources just by googling because the pace at which ML research works makes most books/tutorials outdated. Take retrieval augmented generation, for example. Do you store your documents in an elasticsearch store and build indices periodically or do you store them in FAISS? How separated is your retrieval module from your LLM call? Do you host in-house LLM's centrally company-wide or per-project? What has worked for you so far in the industry? submitted by /u/whyusenosqlreddit [link] [comments]  ( 9 min )
    [D] How do you normalize a large taxonomy with lot of similar words.
    I have a large taxonomy of work titles I scraped from linkedin and other career sites. Now I ahve like 90k titles. To reduce them or group them into a sort of 5k unique titles I tried k means clustering but didn't work out good. How do I proceed with this task? Any pointers would be appreciated. submitted by /u/wet_cosplay [link] [comments]  ( 9 min )
    [R] NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes
    submitted by /u/KaleidoscopeBest1569 [link] [comments]  ( 9 min )
    [Research] Scholars Program
    Hi everyone. We recently announced the Cohere For AI scholars program, a 8 month full-time paid industry research role to join our team and work on fundamental machine learning at scale. The goal is to support rising stars in ML pursue curiosity driven research w access to large scale engineering resources and mentorship. We have intentionally structured the program to be paid and remote-first so we can support talent all across the world. You will have access to a top tier research team and you can find some of our prior publications here. Our deadline is coming up on September11th. Wanted to make sure this was visible to researchers around the world, and thought many in this forum would be interested. More details below for anyone interested: The Cohere For AI Scholars Program supports the next generation of rising ML stars as they embark on their research journey by providing an alternative point of entry into NLP research. Scholars will have access to a large-scale experimental framework and work alongside some of the best researchers and engineering expertise in the world. Participation is full-time, remote-first and paid. For more details, check out our blog post announcing the Scholars Program launch. Applications are open until September 11, 2023. For those undertaking application, would highly recommend joining our open science discord where we have a highly active FAQ channel for any questions about the program. You can find out more about how to join at cohere.for.ai. Looking forward to reading your applications! submitted by /u/ml_magic_ [link] [comments]  ( 9 min )
    [D] How does a ML model differentiate between Nominal and Ordinal?
    Suppose I have data about cars. In it there are multiple columns like 'Type' which contains "Sedan", "Hatchback", "Convertible" and "Minivan". Then there are 'Color' like "Red", "White", "Blue", etc. And I have used ordinal encoding for 'Types' columns and label for 'Color' column. How will the model know that Types is ordinal while Color is nominal. PS. Suppose I cannot use One Hot encoding as it will increase the no of columns by 20 or 30. submitted by /u/Luffykent [link] [comments]  ( 9 min )
    UK Startup Etcembly Unveils AI-Designed Cancer Immunotherapy [N]
    Etcembly, a UK-based biotech startup, has disclosed one of the first generative AI-designed immunotherapy candidates, known to target a protein present in many cancers. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/n6nwoeloohkb1.jpg?width=1200&format=pjpg&auto=webp&s=defa11280ea75f1e6e26ff9014b7e673b0a181ea Key highlights: Etcembly's AI-designed immunotherapy is innovative: The startup used generative AI to design novel cancer immunotherapy in record time. The therapeutic 'ETC-101' was created and optimized in just 11 months, compared to the traditional two years typically needed. The value of AI makes itself evident: Etcembly's AI engine, EMLy, uses LLMs to predict, design, and validate candidate TCRs, scanning…  ( 10 min )
    Understanding the Constraint of Weight Sums in Loss Functions for Noisy Label Learning [Discussion], [Question]
    Working on a machine-learning task with a dataset full of noisy labels. Thinking of using reweighted loss to tackle the label noise issue. I get that it helps give more importance to clean samples during training. But, about the sum of these weights used in the loss function - should they always add up to 1? What's the reasoning behind this constraint? Can't the weights sum up to any positive value instead? Also, if I intend to assign loss values with probabilities, does the weighted sum still need to be 1? Need help clarifying if my understanding is correct! submitted by /u/Positive_External_27 [link] [comments]  ( 9 min )
    [R] To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review
    submitted by /u/hardmaru [link] [comments]  ( 9 min )
    [D] RL[HF] on diffusion models & vision models
    Recently came across: https://datasciencecastnet.home.blog/2023/04/06/a-recipe-for-training-good-generative-models/ and this paper: https://arxiv.org/pdf/2302.08242.pdf The first article is very interesting as it suggests incorporating RLHF in the stack of building a strong diffusion model, the second article demonstrates that it is possible to create stronger computer vision systems with further fine-tuning on metrics (reward functions) that are not differentiable (!), such as mAP for object detection, which I personally found super interesting. These observations makes me think the "general" recipe to build a very good AI model (not only restricted to LLMs) is pretty aligned with what has been done with ChatGPT : 1- supervise fine-tune on a target domain / 2- design & build a reward model / 3- Further align the generations & output with RL Just curious if anyone has any experience with RL + diffusion & vision models? Why do you think this is not super popular yet? submitted by /u/mzitoune [link] [comments]  ( 9 min )
    [D] RL(HF) + diffusion models & vision models
    Recently came across: https://datasciencecastnet.home.blog/2023/04/06/a-recipe-for-training-good-generative-models/ and this paper: https://arxiv.org/pdf/2302.08242.pdf The first article is very interesting as it suggests incorporating RLHF in the stack of building a strong diffusion model, the second article demonstrates that it is possible to create stronger computer vision systems with further fine-tuning on metrics (reward functions) that are not differentiable (!), such as mAP for object detection, which I personally found super interesting. These observations makes me think the "general" recipe to build a very good AI model (any modality) is pretty aligned with what has been done with ChatGPT : 1- supervise fine-tune on a target domain / 2- design / build a reward model / 3- Align the generations / output with RL Just curious if anyone has any experience with RL + diffusion & vision models? Why do you think this is not super popular yet? ​ [link] [comments]  ( 9 min )
    [N] Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B
    Blog: https://www.phind.com/blog/code-llama-beats-gpt4 Models: https://huggingface.co/Phind/Phind-CodeLlama-34B-Python-v1 https://huggingface.co/Phind/Phind-CodeLlama-34B-v1 submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] Llama 2, CodeLlama, and GPT-4 performance: A write-up on the LLM developments and research.
    submitted by /u/seraschka [link] [comments]  ( 9 min )
    [D] Recursive Least Squares vs Gradient Descent for Neural Networks
    I have been captivated by Recursive Least Squares (RLS) methods, particularly the approach that employs error prediction instead of matrix inversion. This method is quite intuitive. Let's consider a scenario where you need to estimate the true effect of four factors (color, gender, age, and weight) on blood sugar. To find the true impact of weight on blood sugar, it's necessary to eliminate the influence of every other factor on weight. This can be accomplished by using simple least squares regression to predict the residual errors recursively, as shown in the diagram below: Removing the effect of all factors on \"weight\" in a recursive manner The fundamental contrast between RLS and Gradient-based methods lies in how errors are distributed across inputs based on their activity, leading to the subsequent update of weights. However, in the case of RLS, all inputs undergo decorrelation before evaluating prediction errors. Comparison between error sharing in RLS and GD This de-correlation can be done in few lines of python code: for i in range(number_of_factors): for j in range(i+1, number_of_factors): wx = np.sum(x[i] * x[j]) / np.sum(x[i]**2) x[j] -= wx * x[i] This approach also bears relevance to predictive coding and can shed light on intriguing neuroscientific findings, such as the increase brain activity during surprising or novel events — attributable to prediction errors. The prediction errors are increasing during the surprising events similar to how brain activity increases. RLS learns very fast but it's still subpar to deep learning when it comes to non-linear hierarchical structures but that is probably because Gradient based methods enjoyed more attention and tinkering from the ML-community. I think RLS methods needs more attention and I have been working on some research projects that uses this method for signal prediction . If you're interested, you can find the source code here: https://github.com/hunar4321/RLS-neural-net ​ submitted by /u/brainxyz [link] [comments]  ( 10 min )
    [P] LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Multi-Query Attention
    submitted by /u/hkproj_ [link] [comments]  ( 9 min )
    [Discussion] What Model to Choose for a NN with a Very Wide Output Layer?
    The input of my neural network consists of 20 features, whereas the output consists of 20,000 of them (predicting a "quantum classical shadow" based on a few parameters: the rotation angle as the input and a few hundreds of shots of random measurements as the output). AFAIK, it's a linear regression problem. What I've tried: - an FCNN (doesn't work good); - Scikit-Learn Lasso (the same results); - MSE regression using Neural Tangents (the same). Any ideas on how to solve this? Thanks a lot in advance! submitted by /u/avpol111 [link] [comments]  ( 9 min )
    [D] What's the best model for iterative ranking determination from pairwise comparisons?
    There are many entities: A, B, C, D... ( B; C > A; D > C; ... A comparison is expensive. Objective: to approximate the absolute order of entities (best entities at the top of the list, worst at the bottom), minimize the number of comparisons The worst solution would be just applying a sorting algorithm, which would require n log n comparisons. I believe an active sampling technique would be required, i.e. select a number of entities with the highest uncertainty, and do comparisons with them, adjust the model, repeat. ChatGPT suggests a Bradley-Terry model and even gives an implementation example. I wonder if there is anything better? submitted by /u/gintrux [link] [comments]  ( 9 min )
    [R] Simpler decision tree implementation question?
    I am trying to implement a decision tree in a very computationally dumb software which can only execute if else statements. If the decision tree is trained some place else and then shared to this software could I deploy the model as a bunch of if else statements. If so how would I know the exact comparison order which would be needed for the if else statements and since this would require to know every detail of the decision tree would I have to make the whole algorithm from scratch so I can access every nook of the decision tree or is there a library which let me access every weight and know what's the weight of each branch? Sorry if it's a dumb question. submitted by /u/ghostfreak999 [link] [comments]  ( 9 min )
    Apple researchers propose a novel method for creating detailed 3D models from images [R]
    Traditional methods of creating 3D models from images often rely on estimating the depth of each pixel in the image, which can result in errors or missing details in areas that are transparent or have low texture. A team of researchers from Apple and UCSB have proposed a new method that directly infers the 3D geometry of a scene using deep neural networks, without requiring any test-time optimization. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/pqxjeafi0dkb1.png?width=748&format=png&auto=webp&s=8daefa852a8805b48cc8586a4a8ec94e5e49123c Why this matters: 3D reconstruction is a fundamental problem in computer vision and graphics: it has many applications in entertainment, education, medicine, and engineering. Howe…  ( 10 min )
  • Open

    'Generative Inbreeding' and its Risk to Human Culture
    submitted by /u/cranberryfix [link] [comments]  ( 9 min )
    OpenAI Just Bought a Game Studio Working on a "Minecraft" Clone
    submitted by /u/cranberryfix [link] [comments]  ( 9 min )
    Best AI companies for you to invest in 2023 (Tabular Comparison included)
    AI is advancing at exponential rate. Its growth is limitless. I have compiled a list of best AI company which are hot stocks right now to invest in 2023. Take a look at them carefully. Meta Platforms Co., Ltd. (META) Meta’s user engagement by 7% in the second quarter. Bank of America has a Buy rating on META stock and a price target of $375 (it closed at $316.56 on Aug. 7). Alphabet Inc. (GOOG, GOOGL) Bank of America has a Buy rating on GOOGL stock and a price target of $146 (it closed at $131.53 on Aug. 7). NVIDIA Corporation (NVDA) Check out Full list ​ submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/25/2023
    Google DeepMind's new chess engine beats its famous AlphaZero.[1] OpenAI partners with Scale AI to allow companies to fine-tune GPT-3.5.[2] AMD has acquired Mipsology, an AI software company focused on computer interpretations and responses to photos and videos.[3] Former Meta researchers who developed an AI language model for biology have launched a new startup and raised at least $40 million, Forbes has learned.[4] Sources: [1] https://the-decoder.com/google-deepminds-new-chess-engine-beats-its-famous-alphazero/ [2] https://techcrunch.com/2023/08/24/openai-partners-with-scale-ai-to-allow-companies-to-fine-tune-gpt-3-5/ [3] https://www.investopedia.com/amd-acquires-french-ai-software-company-mipsology-7852209 [4] https://www.forbes.com/sites/kenrickcai/2023/08/25/evolutionaryscale-ai-biotech-startup-meta-researchers-funding/?sh=7982a406140c submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    This is so impressive...Freddie Mercury AI as Mickael Jackson - Thriller
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
  • Open

    Upside Down Reinforcement Learning Implementation
    I recently implemented UDRL and just published it. If anyone finds it useful, feel free to check it out: https://github.com/mphe/upside-down-reinforcement-learning. There are many other implementations out there, but most of them are difficult to extend and maintain, due to being written in a sloppy manner, or are incorrect, e.g. not using multiplicative interactions or contain smaller bugs and issues. This project aims to fix these issues, while potentially improving performance, providing a proper OOP interface, and reusing code from Stable Baselines 3 where applicable. Furthermore, the algorithm has been extended to support additional features, like multi-threading, which speeds up the training time immensely. It also provides an interface similar to SB algorithms, so it can be used mostly analogously. For more information, see the Github page. Contributions are welcome! submitted by /u/mphe_ [link] [comments]  ( 9 min )
    Multi-Agent RL where agents' actions are dependent on nearby agent's actions
    I am working to design an multi-agent reinforcement learning agent, where the agents that are spatially close are connected and the information is shared, which will be done through a convolution process. However, when convoluting the nearby agents' observations, I also wish nearby agents' action values to be part of the local observation that will be convoluted, however this would cause a dilemma as for an agent to choose and action, it will have to know other agents' actions but the other agents would have to know this agent's value for deciding the value. Are there MARL methods that can help me fix this problem? submitted by /u/LeSUTHU [link] [comments]  ( 9 min )
    Advice on understanding intuition behind RL algorithms.
    I am trying to understand Policy Iteration from the book "Reinforcement learning an introduction". I understood the pseudo code and applied it using python. But still I feel like I don't have a intuitive understanding of Policy Iteration. Like why it works? I know how it works. Any advice on how to get an intuitive understanding of RL algorithms? I reread the policy iteration multiple times, but still feel like I don't understand it. submitted by /u/mono1110 [link] [comments]  ( 9 min )
  • Open

    A small programming language
    Paul Graham said “Programming languages teach you not to want what they don’t provide.” He meant that as a negative: programmers using less expressive languages don’t know what they’re missing. But you could also take that as a positive: using a simple language can teach you that you don’t need features you thought you needed. […] A small programming language first appeared on John D. Cook.  ( 6 min )
    Quadrature rules and an impossibility theorem
    Many numerical integration formulas over a finite interval have the form That is, the integral on the left can be approximated by evaluating the integrand f at particular nodes and taking the weighted sum, and the error is some multiple of a derivative of f evaluated at a point in the interval [a, b]. This […] Quadrature rules and an impossibility theorem first appeared on John D. Cook.  ( 6 min )
  • Open

    Twitter Community / Machine Learning
    submitted by /u/x9182 [link] [comments]  ( 9 min )
    Zoomposium with Professor Dr. John-Dylan Haynes: "In search of the code of the brain"
    Zoomposium with Professor Dr. John-Dylan Haynes: "In search of the code of the brain" In this new episode of our "Zoomposium Series" on the topic of "Brain Research", my colleague Axel Stöcker from the "Blog der großen Fragen" and I have managed to win the well-known and renowned brain researcher and psychologist Professor Dr. John-Dylan Haynes for an interview. John-Dylan Haynes has been a professor of theory and analysis of long-range brain signals at the Bernstein Center for Computational Neuroscience and the Berlin Center for Advanced Neuroimaging (BCAN) at Charité and Humboldt University in Berlin since 2006. There, Professor Haynes and his team are "In Search of the Brain's Code". In order to crack this, larger amounts of data are collected from the functional magnetic resonance i…  ( 10 min )
    Deep Neural Nets: 33 years ago and 33 years from now
    submitted by /u/nickb [link] [comments]  ( 9 min )
  • Open

    Probabilistic load forecasting with Reservoir Computing. (arXiv:2308.12844v1 [cs.LG])
    Some applications of deep learning require not only to provide accurate results but also to quantify the amount of confidence in their prediction. The management of an electric power grid is one of these cases: to avoid risky scenarios, decision-makers need both precise and reliable forecasts of, for example, power loads. For this reason, point forecasts are not enough hence it is necessary to adopt methods that provide an uncertainty quantification. This work focuses on reservoir computing as the core time series forecasting method, due to its computational efficiency and effectiveness in predicting time series. While the RC literature mostly focused on point forecasting, this work explores the compatibility of some popular uncertainty quantification methods with the reservoir setting. Both Bayesian and deterministic approaches to uncertainty assessment are evaluated and compared in terms of their prediction accuracy, computational resource efficiency and reliability of the estimated uncertainty, based on a set of carefully chosen performance metrics.
    DLIP: Distilling Language-Image Pre-training. (arXiv:2308.12956v1 [cs.CV])
    Vision-Language Pre-training (VLP) shows remarkable progress with the assistance of extremely heavy parameters, which challenges deployment in real applications. Knowledge distillation is well recognized as the essential procedure in model compression. However, existing knowledge distillation techniques lack an in-depth investigation and analysis of VLP, and practical guidelines for VLP-oriented distillation are still not yet explored. In this paper, we present DLIP, a simple yet efficient Distilling Language-Image Pre-training framework, through which we investigate how to distill a light VLP model. Specifically, we dissect the model distillation from multiple dimensions, such as the architecture characteristics of different modules and the information transfer of different modalities. We conduct comprehensive experiments and provide insights on distilling a light but performant VLP model. Experimental results reveal that DLIP can achieve a state-of-the-art accuracy/efficiency trade-off across diverse cross-modal tasks, e.g., image-text retrieval, image captioning and visual question answering. For example, DLIP compresses BLIP by 1.9x, from 213M to 108M parameters, while achieving comparable or better performance. Furthermore, DLIP succeeds in retaining more than 95% of the performance with 22.4% parameters and 24.8% FLOPs compared to the teacher model and accelerates inference speed by 2.7x.
    Equal Treatment: Measuring Fairness using Explanation Distributions. (arXiv:2303.08040v2 [cs.LG] UPDATED)
    Liberalism-oriented political philosophy reasons that all individuals should be treated equally independently of their protected characteristics. Related work in machine learning has translated the concept of equal treatment into terms of equal outcome and measured it as demographic parity (also called statistical parity). Our analysis reveals that the two concepts of equal outcome and equal treatment diverge; therefore, demographic parity does not faithfully represent the notion of equal treatment. We propose a new formalization for equal treatment by (i) considering the influence of feature values on predictions, such as computed by Shapley values explaining classifications, (ii) defining distributions of explanations, and (iii) comparing explanation distributions between populations with different protected characteristics. We show the theoretical properties of our notion of equal treatment and devise a classifier two-sample test based on the AUC of an equal treatment inspector. We study our formalization of equal treatment on synthetic and natural data. We release explanationspace, an open-source Python package with methods and tutorials.
    Diagnosing Infeasible Optimization Problems Using Large Language Models. (arXiv:2308.12923v1 [cs.HC])
    Decision-making problems can be represented as mathematical optimization models, finding wide applications in fields such as economics, engineering and manufacturing, transportation, and health care. Optimization models are mathematical abstractions of the problem of making the best decision while satisfying a set of requirements or constraints. One of the primary barriers to deploying these models in practice is the challenge of helping practitioners understand and interpret such models, particularly when they are infeasible, meaning no decision satisfies all the constraints. Existing methods for diagnosing infeasible optimization models often rely on expert systems, necessitating significant background knowledge in optimization. In this paper, we introduce OptiChat, a first-of-its-kind natural language-based system equipped with a chatbot GUI for engaging in interactive conversations about infeasible optimization models. OptiChat can provide natural language descriptions of the optimization model itself, identify potential sources of infeasibility, and offer suggestions to make the model feasible. The implementation of OptiChat is built on GPT-4, which interfaces with an optimization solver to identify the minimal subset of constraints that render the entire optimization problem infeasible, also known as the Irreducible Infeasible Subset (IIS). We utilize few-shot learning, expert chain-of-thought, key-retrieve, and sentiment prompts to enhance OptiChat's reliability. Our experiments demonstrate that OptiChat assists both expert and non-expert users in improving their understanding of the optimization models, enabling them to quickly identify the sources of infeasibility.
    An Accelerated Block Proximal Framework with Adaptive Momentum for Nonconvex and Nonsmooth Optimization. (arXiv:2308.12126v2 [math.OC] UPDATED)
    We propose an accelerated block proximal linear framework with adaptive momentum (ABPL$^+$) for nonconvex and nonsmooth optimization. We analyze the potential causes of the extrapolation step failing in some algorithms, and resolve this issue by enhancing the comparison process that evaluates the trade-off between the proximal gradient step and the linear extrapolation step in our algorithm. Furthermore, we extends our algorithm to any scenario involving updating block variables with positive integers, allowing each cycle to randomly shuffle the update order of the variable blocks. Additionally, under mild assumptions, we prove that ABPL$^+$ can monotonically decrease the function value without strictly restricting the extrapolation parameters and step size, demonstrates the viability and effectiveness of updating these blocks in a random order, and we also more obviously and intuitively demonstrate that the derivative set of the sequence generated by our algorithm is a critical point set. Moreover, we demonstrate the global convergence as well as the linear and sublinear convergence rates of our algorithm by utilizing the Kurdyka-Lojasiewicz (K{\L}) condition. To enhance the effectiveness and flexibility of our algorithm, we also expand the study to the imprecise version of our algorithm and construct an adaptive extrapolation parameter strategy, which improving its overall performance. We apply our algorithm to multiple non-negative matrix factorization with the $\ell_0$ norm, nonnegative tensor decomposition with the $\ell_0$ norm, and perform extensive numerical experiments to validate its effectiveness and efficiency.
    Unsupervised Manifold Linearizing and Clustering. (arXiv:2301.01805v2 [cs.LG] UPDATED)
    We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunately, many real-world datasets such as natural images can not be well approximated by linear subspaces. On the other hand, numerous works have attempted to learn an appropriate transformation of the data, such that data is mapped from a union of general non-linear manifolds to a union of linear subspaces (with points from the same manifold being mapped to the same subspace). However, many existing works have limitations such as assuming knowledge of the membership of samples to clusters, requiring high sampling density, or being shown theoretically to learn trivial representations. In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. We give a parameterization of such a representation and membership, allowing efficient mini-batching and one-shot initialization. Experiments on CIFAR-10, -20, -100, and TinyImageNet-200 datasets show that the proposed method is much more accurate and scalable than state-of-the-art deep clustering methods, and further learns a latent linear representation of the data.
    Wasserstein Geodesic Generator for Conditional Distributions. (arXiv:2308.10145v2 [stat.ML] UPDATED)
    Generating samples given a specific label requires estimating conditional distributions. We derive a tractable upper bound of the Wasserstein distance between conditional distributions to lay the theoretical groundwork to learn conditional distributions. Based on this result, we propose a novel conditional generation algorithm where conditional distributions are fully characterized by a metric space defined by a statistical distance. We employ optimal transport theory to propose the Wasserstein geodesic generator, a new conditional generator that learns the Wasserstein geodesic. The proposed method learns both conditional distributions for observed domains and optimal transport maps between them. The conditional distributions given unobserved intermediate domains are on the Wasserstein geodesic between conditional distributions given two observed domain labels. Experiments on face images with light conditions as domain labels demonstrate the efficacy of the proposed method.
    FlexFringe: Modeling Software Behavior by Learning Probabilistic Automata. (arXiv:2203.16331v3 [cs.LG] UPDATED)
    We present the efficient implementations of probabilistic deterministic finite automaton learning methods available in FlexFringe. These implement well-known strategies for state-merging including several modifications to improve their performance in practice. We show experimentally that these algorithms obtain competitive results and significant improvements over a default implementation. We also demonstrate how to use FlexFringe to learn interpretable models from software logs and use these for anomaly detection. Although less interpretable, we show that learning smaller more convoluted models improves the performance of FlexFringe on anomaly detection, outperforming an existing solution based on neural nets.
    Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition. (arXiv:2308.12673v1 [cs.CV])
    In this paper, we introduce Masked Feature Modelling (MFM), a novel approach for the unsupervised pre-training of a Graph Attention Network (GAT) block. MFM utilizes a pretrained Visual Tokenizer to reconstruct masked features of objects within a video, leveraging the MiniKinetics dataset. We then incorporate the pre-trained GAT block into a state-of-the-art bottom-up supervised video-event recognition architecture, ViGAT, to improve the model's starting point and overall accuracy. Experimental evaluations on the YLI-MED dataset demonstrate the effectiveness of MFM in improving event recognition performance.
    LR-XFL: Logical Reasoning-based Explainable Federated Learning. (arXiv:2308.12681v1 [cs.AI])
    Federated learning (FL) is an emerging approach for training machine learning models collaboratively while preserving data privacy. The need for privacy protection makes it difficult for FL models to achieve global transparency and explainability. To address this limitation, we incorporate logic-based explanations into FL by proposing the Logical Reasoning-based eXplainable Federated Learning (LR-XFL) approach. Under LR-XFL, FL clients create local logic rules based on their local data and send them, along with model updates, to the FL server. The FL server connects the local logic rules through a proper logical connector that is derived based on properties of client data, without requiring access to the raw data. In addition, the server also aggregates the local model updates with weight values determined by the quality of the clients' local data as reflected by their uploaded logic rules. The results show that LR-XFL outperforms the most relevant baseline by 1.19%, 5.81% and 5.41% in terms of classification accuracy, rule accuracy and rule fidelity, respectively. The explicit rule evaluation and expression under LR-XFL enable human experts to validate and correct the rules on the server side, hence improving the global FL model's robustness to errors. It has the potential to enhance the transparency of FL models for areas like healthcare and finance where both data privacy and explainability are important.
    Towards Automated Animal Density Estimation with Acoustic Spatial Capture-Recapture. (arXiv:2308.12859v1 [cs.SD])
    Passive acoustic monitoring can be an effective way of monitoring wildlife populations that are acoustically active but difficult to survey visually. Digital recorders allow surveyors to gather large volumes of data at low cost, but identifying target species vocalisations in these data is non-trivial. Machine learning (ML) methods are often used to do the identification. They can process large volumes of data quickly, but they do not detect all vocalisations and they do generate some false positives (vocalisations that are not from the target species). Existing wildlife abundance survey methods have been designed specifically to deal with the first of these mistakes, but current methods of dealing with false positives are not well-developed. They do not take account of features of individual vocalisations, some of which are more likely to be false positives than others. We propose three methods for acoustic spatial capture-recapture inference that integrate individual-level measures of confidence from ML vocalisation identification into the likelihood and hence integrate ML uncertainty into inference. The methods include a mixture model in which species identity is a latent variable. We test the methods by simulation and find that in a scenario based on acoustic data from Hainan gibbons, in which ignoring false positives results in 17% positive bias, our methods give negligible bias and coverage probabilities that are close to the nominal 95% level.
    Leveraging Global Binary Masks for Structure Segmentation in Medical Images. (arXiv:2205.09107v2 [eess.IV] UPDATED)
    Deep learning (DL) models for medical image segmentation are highly influenced by intensity variations of input images and lack generalization due to primarily utilizing pixels' intensity information for inference. Acquiring sufficient training data is another challenge limiting models' applications. We proposed to leverage the consistency of organs' anatomical shape and position information in medical images. We introduced a framework leveraging recurring anatomical patterns through global binary masks for organ segmentation. Two scenarios were studied.1) Global binary masks were the only model's (i.e. U-Net) input, forcing exclusively encoding organs' position and shape information for segmentation/localization.2) Global binary masks were incorporated as an additional channel functioning as position/shape clues to mitigate training data scarcity. Two datasets of the brain and heart CT images with their ground-truth were split into (26:10:10) and (12:3:5) for training, validation, and test respectively. Training exclusively on global binary masks led to Dice scores of 0.77(0.06) and 0.85(0.04), with the average Euclidian distance of 3.12(1.43)mm and 2.5(0.93)mm relative to the center of mass of the ground truth for the brain and heart structures respectively. The outcomes indicate that a surprising degree of position and shape information is encoded through global binary masks. Incorporating global binary masks led to significantly higher accuracy relative to the model trained on only CT images in small subsets of training data; the performance improved by 4.3-125.3% and 1.3-48.1% for 1-8 training cases of the brain and heart datasets respectively. The findings imply the advantages of utilizing global binary masks for building generalizable models and to compensate for training data scarcity.
    Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion. (arXiv:2308.12734v1 [cs.SD])
    There are growing implications surrounding generative AI in the speech domain that enable voice cloning and real-time voice conversion from one individual to another. This technology poses a significant ethical threat and could lead to breaches of privacy and misrepresentation, thus there is an urgent need for real-time detection of AI-generated speech for DeepFake Voice Conversion. To address the above emerging issues, the DEEP-VOICE dataset is generated in this study, comprised of real human speech from eight well-known figures and their speech converted to one another using Retrieval-based Voice Conversion. Presenting as a binary classification problem of whether the speech is real or AI-generated, statistical analysis of temporal audio features through t-testing reveals that there are significantly different distributions. Hyperparameter optimisation is implemented for machine learning models to identify the source of speech. Following the training of 208 individual machine learning models over 10-fold cross validation, it is found that the Extreme Gradient Boosting model can achieve an average classification accuracy of 99.3% and can classify speech in real-time, at around 0.004 milliseconds given one second of speech. All data generated for this study is released publicly for future research on AI speech detection.
    Individual Privacy Accounting with Gaussian Differential Privacy. (arXiv:2209.15596v2 [cs.CR] UPDATED)
    Individual privacy accounting enables bounding differential privacy (DP) loss individually for each participant involved in the analysis. This can be informative as often the individual privacy losses are considerably smaller than those indicated by the DP bounds that are based on considering worst-case bounds at each data access. In order to account for the individual privacy losses in a principled manner, we need a privacy accountant for adaptive compositions of randomised mechanisms, where the loss incurred at a given data access is allowed to be smaller than the worst-case loss. This kind of analysis has been carried out for the R\'enyi differential privacy (RDP) by Feldman and Zrnic (2021), however not yet for the so-called optimal privacy accountants. We make first steps in this direction by providing a careful analysis using the Gaussian differential privacy which gives optimal bounds for the Gaussian mechanism, one of the most versatile DP mechanisms. This approach is based on determining a certain supermartingale for the hockey-stick divergence and on extending the R\'enyi divergence-based fully adaptive composition results by Feldman and Zrnic. We also consider measuring the individual $(\varepsilon,\delta)$-privacy losses using the so-called privacy loss distributions. With the help of the Blackwell theorem, we can then make use of the RDP analysis to construct an approximative individual $(\varepsilon,\delta)$-accountant.
    Constrained Stein Variational Trajectory Optimization. (arXiv:2308.12110v1 [cs.RO] CROSS LISTED)
    We present Constrained Stein Variational Trajectory Optimization (CSVTO), an algorithm for performing trajectory optimization with constraints on a set of trajectories in parallel. We frame constrained trajectory optimization as a novel form of constrained functional minimization over trajectory distributions, which avoids treating the constraints as a penalty in the objective and allows us to generate diverse sets of constraint-satisfying trajectories. Our method uses Stein Variational Gradient Descent (SVGD) to find a set of particles that approximates a distribution over low-cost trajectories while obeying constraints. CSVTO is applicable to problems with arbitrary equality and inequality constraints and includes a novel particle resampling step to escape local minima. By explicitly generating diverse sets of trajectories, CSVTO is better able to avoid poor local minima and is more robust to initialization. We demonstrate that CSVTO outperforms baselines in challenging highly-constrained tasks, such as a 7DoF wrench manipulation task, where CSVTO succeeds in 20/20 trials vs 13/20 for the closest baseline. Our results demonstrate that generating diverse constraint-satisfying trajectories improves robustness to disturbances and initialization over baselines.
    Persistent learning signals and working memory without continuous attractors. (arXiv:2308.12585v1 [q-bio.NC])
    Neural dynamical systems with stable attractor structures, such as point attractors and continuous attractors, are hypothesized to underlie meaningful temporal behavior that requires working memory. However, working memory may not support useful learning signals necessary to adapt to changes in the temporal structure of the environment. We show that in addition to the continuous attractors that are widely implicated, periodic and quasi-periodic attractors can also support learning arbitrarily long temporal relationships. Unlike the continuous attractors that suffer from the fine-tuning problem, the less explored quasi-periodic attractors are uniquely qualified for learning to produce temporally structured behavior. Our theory has broad implications for the design of artificial learning systems and makes predictions about observable signatures of biological neural dynamics that can support temporal dependence learning and working memory. Based on our theory, we developed a new initialization scheme for artificial recurrent neural networks that outperforms standard methods for tasks that require learning temporal dynamics. Moreover, we propose a robust recurrent memory mechanism for integrating and maintaining head direction without a ring attractor.
    To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review. (arXiv:2304.09355v4 [cs.LG] UPDATED)
    \begin{abstract} Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data. Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels. Information theory, and notably the information bottleneck principle, has been pivotal in shaping deep neural networks. This principle focuses on optimizing the trade-off between compression and preserving relevant information, providing a foundation for efficient network design in supervised contexts. However, its precise role and adaptation in self-supervised learning remain unclear. In this work, we scrutinize various self-supervised learning approaches from an information-theoretic perspective, introducing a unified framework that encapsulates the self-supervised information-theoretic learning problem. We weave together existing research into a cohesive narrative, delve into contemporary self-supervised methodologies, and spotlight potential research avenues and inherent challenges. Additionally, we discuss the empirical evaluation of information-theoretic quantities and their estimation methods. Overall, this paper furnishes an exhaustive review of the intersection of information theory, self-supervised learning, and deep neural networks.
    Natural Language is All a Graph Needs. (arXiv:2308.07134v3 [cs.CL] UPDATED)
    The emergence of large-scale pre-trained language models, such as ChatGPT, has revolutionized various research fields in artificial intelligence. Transformers-based large language models (LLMs) have gradually replaced CNNs and RNNs to unify fields of computer vision and natural language processing. Compared with the data that exists relatively independently such as images, videos or texts, graph is a type of data that contains rich structural and relational information. Meanwhile, natural language, as one of the most expressive mediums, excels in describing complex structures. However, existing work on incorporating graph learning problems into the generative language modeling framework remains very limited. As the importance of large language models continues to grow, it becomes essential to explore whether LLMs can also replace GNNs as the foundation model for graphs. In this paper, we propose InstructGLM (Instruction-finetuned Graph Language Model), systematically design highly scalable prompts based on natural language instructions, and use natural language to describe the geometric structure and node features of the graph for instruction tuning an LLM to perform learning and inference on graphs in a generative manner. Our method exceeds all competitive GNN baselines on ogbn-arxiv, Cora and PubMed datasets, which demonstrates the effectiveness of our method and sheds light on generative large language models as the foundation model for graph machine learning.
    POLCA: Power Oversubscription in LLM Cloud Providers. (arXiv:2308.12908v1 [cs.DC])
    Recent innovation in large language models (LLMs), and their myriad use-cases have rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud providers and other enterprises have made substantial plans of growth in their datacenters to support these new workloads. One of the key bottleneck resources in datacenters is power, and given the increasing model sizes of LLMs, they are becoming increasingly power intensive. In this paper, we show that there is a significant opportunity to oversubscribe power in LLM clusters. Power oversubscription improves the power efficiency of these datacenters, allowing more deployable servers per datacenter, and reduces the deployment time, since building new datacenters is slow. We extensively characterize the power consumption patterns of a variety of LLMs and their configurations. We identify the differences between the inference and training power consumption patterns. Based on our analysis of these LLMs, we claim that the average and peak power utilization in LLM clusters for inference should not be very high. Our deductions align with the data from production LLM clusters, revealing that inference workloads offer substantial headroom for power oversubscription. However, the stringent set of telemetry and controls that GPUs offer in a virtualized environment, makes it challenging to have a reliable and robust power oversubscription mechanism. We propose POLCA, our framework for power oversubscription that is robust, reliable, and readily deployable for GPU clusters. Using open-source models to replicate the power patterns observed in production, we simulate POLCA and demonstrate that we can deploy 30% more servers in the same GPU cluster for inference, with minimal performance loss
    Self-Supervised Training with Autoencoders for Visual Anomaly Detection. (arXiv:2206.11723v4 [cs.CV] UPDATED)
    Deep autoencoders provide an effective tool for learning non-linear dimensionality reduction in an unsupervised way. Recently, they have been used for the task of anomaly detection in the visual domain. By optimizing for the reconstruction error using anomaly-free examples, the common belief is that a corresponding network should fail to accurately reconstruct anomalous regions in the application phase. This goal is typically addressed by controlling the capacity of the network, either by reducing the size of the bottleneck layer or by enforcing sparsity constraints on the activations. However, neither of these techniques does explicitly penalize reconstruction of anomalous signals often resulting in poor detection. We tackle this problem by adapting a self-supervised learning regime that allows the use of discriminative information during training but focuses on the data manifold of normal examples. We emphasize that inference with our approach is very efficient during training and prediction requiring a single forward pass for each input image. Our experiments on the MVTec AD dataset demonstrate high detection and localization performance. On the texture-subset, in particular, our approach consistently outperforms recent anomaly detection methods by a significant margin.
    Beyond Document Page Classification: Design, Datasets, and Challenges. (arXiv:2308.12896v1 [cs.CV])
    This paper highlights the need to bring document classification benchmarking closer to real-world applications, both in the nature of data tested ($X$: multi-channel, multi-paged, multi-industry; $Y$: class distributions and label set variety) and in classification tasks considered ($f$: multi-page document, page stream, and document bundle classification, ...). We identify the lack of public multi-page document classification datasets, formalize different classification tasks arising in application scenarios, and motivate the value of targeting efficient multi-page document representations. An experimental study on proposed multi-page document classification datasets demonstrates that current benchmarks have become irrelevant and need to be updated to evaluate complete documents, as they naturally occur in practice. This reality check also calls for more mature evaluation methodologies, covering calibration evaluation, inference complexity (time-memory), and a range of realistic distribution shifts (e.g., born-digital vs. scanning noise, shifting page order). Our study ends on a hopeful note by recommending concrete avenues for future improvements.}
    Exact Bayesian Inference on Discrete Models via Probability Generating Functions: A Probabilistic Programming Approach. (arXiv:2305.17058v2 [cs.PL] UPDATED)
    We present an exact Bayesian inference method for discrete statistical models, which can find exact solutions to many discrete inference problems, even with infinite support and continuous priors. To express such models, we introduce a probabilistic programming language that supports discrete and continuous sampling, discrete observations, affine functions, (stochastic) branching, and conditioning on events. Our key tool is probability generating functions: they provide a compact closed-form representation of distributions that are definable by programs, thus enabling the exact computation of posterior probabilities, expectation, variance, and higher moments. Our inference method is provably correct, fully automated and uses automatic differentiation (specifically, Taylor polynomials), but does not require computer algebra. Our experiments show that its performance on a range of real-world examples is competitive with approximate Monte Carlo methods, while avoiding approximation errors.
    Pareto Invariant Representation Learning for Multimedia Recommendation. (arXiv:2308.04706v2 [cs.IR] UPDATED)
    Multimedia recommendation involves personalized ranking tasks, where multimedia content is usually represented using a generic encoder. However, these generic representations introduce spurious correlations that fail to reveal users' true preferences. Existing works attempt to alleviate this problem by learning invariant representations, but overlook the balance between independent and identically distributed (IID) and out-of-distribution (OOD) generalization. In this paper, we propose a framework called Pareto Invariant Representation Learning (PaInvRL) to mitigate the impact of spurious correlations from an IID-OOD multi-objective optimization perspective, by learning invariant representations (intrinsic factors that attract user attention) and variant representations (other factors) simultaneously. Specifically, PaInvRL includes three iteratively executed modules: (i) heterogeneous identification module, which identifies the heterogeneous environments to reflect distributional shifts for user-item interactions; (ii) invariant mask generation module, which learns invariant masks based on the Pareto-optimal solutions that minimize the adaptive weighted Invariant Risk Minimization (IRM) and Empirical Risk (ERM) losses; (iii) convert module, which generates both variant representations and item-invariant representations for training a multi-modal recommendation model that mitigates spurious correlations and balances the generalization performance within and cross the environmental distributions. We compare the proposed PaInvRL with state-of-the-art recommendation models on three public multimedia recommendation datasets (Movielens, Tiktok, and Kwai), and the experimental results validate the effectiveness of PaInvRL for both within- and cross-environmental learning.
    CDAN: Convolutional Dense Attention-guided Network for Low-light Image Enhancement. (arXiv:2308.12902v1 [cs.CV])
    Low-light images, characterized by inadequate illumination, pose challenges of diminished clarity, muted colors, and reduced details. Low-light image enhancement, an essential task in computer vision, aims to rectify these issues by improving brightness, contrast, and overall perceptual quality, thereby facilitating accurate analysis and interpretation. This paper introduces the Convolutional Dense Attention-guided Network (CDAN), a novel solution for enhancing low-light images. CDAN integrates an autoencoder-based architecture with convolutional and dense blocks, complemented by an attention mechanism and skip connections. This architecture ensures efficient information propagation and feature learning. Furthermore, a dedicated post-processing phase refines color balance and contrast. Our approach demonstrates notable progress compared to state-of-the-art results in low-light image enhancement, showcasing its robustness across a wide range of challenging scenarios. Our model performs remarkably on benchmark datasets, effectively mitigating under-exposure and proficiently restoring textures and colors in diverse low-light scenarios. This achievement underscores CDAN's potential for diverse computer vision tasks, notably enabling robust object detection and recognition in challenging low-light conditions.
    Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation with a Unified Model. (arXiv:2303.06965v3 [cs.LG] UPDATED)
    Chemical reactions are the fundamental building blocks of drug design and organic chemistry research. In recent years, there has been a growing need for a large-scale deep-learning framework that can efficiently capture the basic rules of chemical reactions. In this paper, we have proposed a unified framework that addresses both the reaction representation learning and molecule generation tasks, which allows for a more holistic approach. Inspired by the organic chemistry mechanism, we develop a novel pretraining framework that enables us to incorporate inductive biases into the model. Our framework achieves state-of-the-art results on challenging downstream tasks. By possessing chemical knowledge, our generative framework overcome the limitations of current molecule generation models that rely on a small number of reaction templates. In the extensive experiments, our model generates synthesizable drug-like structures of high quality. Overall, our work presents a significant step toward a large-scale deep-learning framework for a variety of reaction-based applications.
    A Survey on Dataset Distillation: Approaches, Applications and Future Directions. (arXiv:2305.01975v3 [cs.LG] UPDATED)
    Dataset distillation is attracting more attention in machine learning as training sets continue to grow and the cost of training state-of-the-art models becomes increasingly high. By synthesizing datasets with high information density, dataset distillation offers a range of potential applications, including support for continual learning, neural architecture search, and privacy protection. Despite recent advances, we lack a holistic understanding of the approaches and applications. Our survey aims to bridge this gap by first proposing a taxonomy of dataset distillation, characterizing existing approaches, and then systematically reviewing the data modalities, and related applications. In addition, we summarize the challenges and discuss future directions for this field of research.
    Towards Efficient and Comprehensive Urban Spatial-Temporal Prediction: A Unified Library and Performance Benchmark. (arXiv:2304.14343v5 [cs.LG] UPDATED)
    As deep learning technology advances and more urban spatial-temporal data accumulates, an increasing number of deep learning models are being proposed to solve urban spatial-temporal prediction problems. However, there are limitations in the existing field, including open-source data being in various formats and difficult to use, few papers making their code and data openly available, and open-source models often using different frameworks and platforms, making comparisons challenging. A standardized framework is urgently needed to implement and evaluate these methods. To address these issues, we provide a comprehensive review of urban spatial-temporal prediction and propose a unified storage format for spatial-temporal data called atomic files. We also propose LibCity, an open-source library that offers researchers a credible experimental tool and a convenient development framework. In this library, we have reproduced 65 spatial-temporal prediction models and collected 55 spatial-temporal datasets, allowing researchers to conduct comprehensive experiments conveniently. Using LibCity, we conducted a series of experiments to validate the effectiveness of different models and components, and we summarized promising future technology developments and research directions for spatial-temporal prediction. By enabling fair model comparisons, designing a unified data storage format, and simplifying the process of developing new models, LibCity is poised to make significant contributions to the spatial-temporal prediction field.
    Fast Adversarial Training with Smooth Convergence. (arXiv:2308.12857v1 [cs.LG])
    Fast adversarial training (FAT) is beneficial for improving the adversarial robustness of neural networks. However, previous FAT work has encountered a significant issue known as catastrophic overfitting when dealing with large perturbation budgets, \ie the adversarial robustness of models declines to near zero during training. To address this, we analyze the training process of prior FAT work and observe that catastrophic overfitting is accompanied by the appearance of loss convergence outliers. Therefore, we argue a moderately smooth loss convergence process will be a stable FAT process that solves catastrophic overfitting. To obtain a smooth loss convergence process, we propose a novel oscillatory constraint (dubbed ConvergeSmooth) to limit the loss difference between adjacent epochs. The convergence stride of ConvergeSmooth is introduced to balance convergence and smoothing. Likewise, we design weight centralization without introducing additional hyperparameters other than the loss balance coefficient. Our proposed methods are attack-agnostic and thus can improve the training stability of various FAT techniques. Extensive experiments on popular datasets show that the proposed methods efficiently avoid catastrophic overfitting and outperform all previous FAT methods. Code is available at \url{https://github.com/FAT-CS/ConvergeSmooth}.
    The SWAX Benchmark: Attacking Biometric Systems with Wax Figures. (arXiv:1910.09642v1 [cs.CV] CROSS LISTED)
    A face spoofing attack occurs when an intruder attempts to impersonate someone who carries a gainful authentication clearance. It is a trending topic due to the increasing demand for biometric authentication on mobile devices, high-security areas, among others. This work introduces a new database named Sense Wax Attack dataset (SWAX), comprised of real human and wax figure images and videos that endorse the problem of face spoofing detection. The dataset consists of more than 1800 face images and 110 videos of 55 people/waxworks, arranged in training, validation and test sets with a large range in expression, illumination and pose variations. Experiments performed with baseline methods show that despite the progress in recent years, advanced spoofing methods are still vulnerable to high-quality violation attempts.
    Open-set Face Recognition using Ensembles trained on Clustered Data. (arXiv:2308.07445v1 [cs.CV] CROSS LISTED)
    Open-set face recognition describes a scenario where unknown subjects, unseen during the training stage, appear on test time. Not only it requires methods that accurately identify individuals of interest, but also demands approaches that effectively deal with unfamiliar faces. This work details a scalable open-set face identification approach to galleries composed of hundreds and thousands of subjects. It is composed of clustering and an ensemble of binary learning algorithms that estimates when query face samples belong to the face gallery and then retrieves their correct identity. The approach selects the most suitable gallery subjects and uses the ensemble to improve prediction performance. We carry out experiments on well-known LFW and YTF benchmarks. Results show that competitive performance can be achieved even when targeting scalability.
    Improving Sample Quality of Diffusion Models Using Self-Attention Guidance. (arXiv:2210.00939v6 [cs.CV] UPDATED)
    Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality and diversity. This success is largely attributed to the use of class- or text-conditional diffusion guidance methods, such as classifier and classifier-free guidance. In this paper, we present a more comprehensive perspective that goes beyond the traditional guidance methods. From this generalized perspective, we introduce novel condition- and training-free strategies to enhance the quality of generated images. As a simple solution, blur guidance improves the suitability of intermediate samples for their fine-scale information and structures, enabling diffusion models to generate higher quality samples with a moderate guidance scale. Improving upon this, Self-Attention Guidance (SAG) uses the intermediate self-attention maps of diffusion models to enhance their stability and efficacy. Specifically, SAG adversarially blurs only the regions that diffusion models attend to at each iteration and guides them accordingly. Our experimental results show that our SAG improves the performance of various diffusion models, including ADM, IDDPM, Stable Diffusion, and DiT. Moreover, combining SAG with conventional guidance methods leads to further improvement.
    HyperTab: Hypernetwork Approach for Deep Learning on Small Tabular Datasets. (arXiv:2304.03543v2 [cs.LG] UPDATED)
    Deep learning has achieved impressive performance in many domains, such as computer vision and natural language processing, but its advantage over classical shallow methods on tabular datasets remains questionable. It is especially challenging to surpass the performance of tree-like ensembles, such as XGBoost or Random Forests, on small-sized datasets (less than 1k samples). To tackle this challenge, we introduce HyperTab, a hypernetwork-based approach to solving small sample problems on tabular datasets. By combining the advantages of Random Forests and neural networks, HyperTab generates an ensemble of neural networks, where each target model is specialized to process a specific lower-dimensional view of the data. Since each view plays the role of data augmentation, we virtually increase the number of training samples while keeping the number of trainable parameters unchanged, which prevents model overfitting. We evaluated HyperTab on more than 40 tabular datasets of a varying number of samples and domains of origin, and compared its performance with shallow and deep learning models representing the current state-of-the-art. We show that HyperTab consistently outranks other methods on small data (with a statistically significant difference) and scores comparable to them on larger datasets. We make a python package with the code available to download at https://pypi.org/project/hypertab/
    Don't Look into the Sun: Adversarial Solarization Attacks on Image Classifiers. (arXiv:2308.12661v1 [cs.CV])
    Assessing the robustness of deep neural networks against out-of-distribution inputs is crucial, especially in safety-critical domains like autonomous driving, but also in safety systems where malicious actors can digitally alter inputs to circumvent safety guards. However, designing effective out-of-distribution tests that encompass all possible scenarios while preserving accurate label information is a challenging task. Existing methodologies often entail a compromise between variety and constraint levels for attacks and sometimes even both. In a first step towards a more holistic robustness evaluation of image classification models, we introduce an attack method based on image solarization that is conceptually straightforward yet avoids jeopardizing the global structure of natural images independent of the intensity. Through comprehensive evaluations of multiple ImageNet models, we demonstrate the attack's capacity to degrade accuracy significantly, provided it is not integrated into the training augmentations. Interestingly, even then, no full immunity to accuracy deterioration is achieved. In other settings, the attack can often be simplified into a black-box attack with model-independent parameters. Defenses against other corruptions do not consistently extend to be effective against our specific attack. Project website: https://github.com/paulgavrikov/adversarial_solarization
    Conformal Prediction Regions for Time Series using Linear Complementarity Programming. (arXiv:2304.01075v3 [eess.SY] UPDATED)
    Conformal prediction is a statistical tool for producing prediction regions of machine learning models that are valid with high probability. However, applying conformal prediction to time series data leads to conservative prediction regions. In fact, to obtain prediction regions over $T$ time steps with confidence $1-\delta$, {previous works require that each individual prediction region is valid} with confidence $1-\delta/T$. We propose an optimization-based method for reducing this conservatism to enable long horizon planning and verification when using learning-enabled time series predictors. Instead of considering prediction errors individually at each time step, we consider a parameterized prediction error over multiple time steps. By optimizing the parameters over an additional dataset, we find prediction regions that are not conservative. We show that this problem can be cast as a mixed integer linear complementarity program (MILCP), which we then relax into a linear complementarity program (LCP). Additionally, we prove that the relaxed LP has the same optimal cost as the original MILCP. Finally, we demonstrate the efficacy of our method on case studies using pedestrian trajectory predictors and F16 fighter jet altitude predictors.
    On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget. (arXiv:2308.12000v2 [stat.ML] UPDATED)
    We study the problem of best-arm identification with fixed budget in stochastic two-arm bandits with Bernoulli rewards. We prove that surprisingly, there is no algorithm that (i) performs as well as the algorithm sampling each arm equally (this algorithm is referred to as the {\it uniform sampling} algorithm) on all instances, and that (ii) strictly outperforms this algorithm on at least one instance. In short, there is no algorithm better than the uniform sampling algorithm. Towards this result, we introduce the natural class of {\it consistent} and {\it stable} algorithms, and show that any algorithm that performs as well as the uniform sampling algorithm on all instances belongs to this class. The proof is completed by deriving a lower bound on the error rate satisfied by any consistent and stable algorithm, and by showing that the uniform sampling algorithm matches this lower bound. Our results provide a solution to the two open problems presented in \cite{qin2022open}.
    Optimal data pooling for shared learning in maintenance operations. (arXiv:2308.12670v1 [cs.LG])
    This paper addresses the benefits of pooling data for shared learning in maintenance operations. We consider a set of systems subject to Poisson degradation that are coupled through an a-priori unknown rate. Decision problems involving these systems are high-dimensional Markov decision processes (MDPs). We present a decomposition result that reduces such an MDP to two-dimensional MDPs, enabling structural analyses and computations. We leverage this decomposition to demonstrate that pooling data can lead to significant cost reductions compared to not pooling.
    Hypergraph Convolutional Networks for Fine-grained ICU Patient Similarity Analysis and Risk Prediction. (arXiv:2308.12575v1 [cs.LG])
    The Intensive Care Unit (ICU) is one of the most important parts of a hospital, which admits critically ill patients and provides continuous monitoring and treatment. Various patient outcome prediction methods have been attempted to assist healthcare professionals in clinical decision-making. Existing methods focus on measuring the similarity between patients using deep neural networks to capture the hidden feature structures. However, the higher-order relationships are ignored, such as patient characteristics (e.g., diagnosis codes) and their causal effects on downstream clinical predictions. In this paper, we propose a novel Hypergraph Convolutional Network that allows the representation of non-pairwise relationships among diagnosis codes in a hypergraph to capture the hidden feature structures so that fine-grained patient similarity can be calculated for personalized mortality risk prediction. Evaluation using a publicly available eICU Collaborative Research Database indicates that our method achieves superior performance over the state-of-the-art models on mortality risk prediction. Moreover, the results of several case studies demonstrated the effectiveness of constructing graph networks in providing good transparency and robustness in decision-making.
    Multi-fidelity Fourier Neural Operator for Fast Modeling of Large-Scale Geological Carbon Storage. (arXiv:2308.09113v2 [stat.ML] UPDATED)
    Deep learning-based surrogate models have been widely applied in geological carbon storage (GCS) problems to accelerate the prediction of reservoir pressure and CO2 plume migration. Large amounts of data from physics-based numerical simulators are required to train a model to accurately predict the complex physical behaviors associated with this process. In practice, the available training data are always limited in large-scale 3D problems due to the high computational cost. Therefore, we propose to use a multi-fidelity Fourier Neural Operator to solve large-scale GCS problems with more affordable multi-fidelity training datasets. The Fourier Neural Operator has a desirable grid-invariant property, which simplifies the transfer learning procedure between datasets with different discretization. We first test the model efficacy on a GCS reservoir model being discretized into 110k grid cells. The multi-fidelity model can predict with accuracy comparable to a high-fidelity model trained with the same amount of high-fidelity data with 81% less data generation costs. We further test the generalizability of the multi-fidelity model on a same reservoir model with a finer discretization of 1 million grid cells. This case was made more challenging by employing high-fidelity and low-fidelity datasets generated by different geostatistical models and reservoir simulators. We observe that the multi-fidelity FNO model can predict pressure fields with reasonable accuracy even when the high-fidelity data are extremely limited.
    Adversarial Training Using Feedback Loops. (arXiv:2308.11881v2 [cs.LG] UPDATED)
    Deep neural networks (DNN) have found wide applicability in numerous fields due to their ability to accurately learn very complex input-output relations. Despite their accuracy and extensive use, DNNs are highly susceptible to adversarial attacks due to limited generalizability. For future progress in the field, it is essential to build DNNs that are robust to any kind of perturbations to the data points. In the past, many techniques have been proposed to robustify DNNs using first-order derivative information of the network. This paper proposes a new robustification approach based on control theory. A neural network architecture that incorporates feedback control, named Feedback Neural Networks, is proposed. The controller is itself a neural network, which is trained using regular and adversarial data such as to stabilize the system outputs. The novel adversarial training approach based on the feedback control architecture is called Feedback Looped Adversarial Training (FLAT). Numerical results on standard test problems empirically show that our FLAT method is more effective than the state-of-the-art to guard against adversarial attacks.
    Unifying Gradients to Improve Real-world Robustness for Deep Networks. (arXiv:2208.06228v2 [stat.ML] UPDATED)
    The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i.e., whether a DNN resists black-box adversarial attacks, among which score-based query attacks (SQAs) are most threatening since they can effectively hurt a victim network with the only access to model outputs. Defending against SQAs requires a slight but artful variation of outputs due to the service purpose for users, who share the same output information with SQAs. In this paper, we propose a real-world defense by Unifying Gradients (UniG) of different data so that SQAs could only probe a much weaker attack direction that is similar for different samples. Since such universal attack perturbations have been validated as less aggressive than the input-specific perturbations, UniG protects real-world DNNs by indicating attackers a twisted and less informative attack direction. We implement UniG efficiently by a Hadamard product module which is plug-and-play. According to extensive experiments on 5 SQAs, 2 adaptive attacks and 7 defense baselines, UniG significantly improves real-world robustness without hurting clean accuracy on CIFAR10 and ImageNet. For instance, UniG maintains a model of 77.80% accuracy under 2500-query Square attack while the state-of-the-art adversarially-trained model only has 67.34% on CIFAR10. Simultaneously, UniG outperforms all compared baselines in terms of clean accuracy and achieves the smallest modification of the model output. The code is released at https://github.com/snowien/UniG-pytorch.
    Open-set Face Recognition with Neural Ensemble, Maximal Entropy Loss and Feature Augmentation. (arXiv:2308.12371v1 [cs.CV])
    Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of interest. As a response, this work introduces a novel method that associates an ensemble of compact neural networks with a margin-based cost function that explores additional samples. Supplementary negative samples can be obtained from external databases or synthetically built at the representation level in training time with a new mix-up feature augmentation approach. Deep neural networks pre-trained on large face datasets serve as the preliminary feature extraction module. We carry out experiments on well-known LFW and IJB-C datasets where results show that the approach is able to boost closed and open-set identification rates.
    APART: Diverse Skill Discovery using All Pairs with Ascending Reward and DropouT. (arXiv:2308.12649v1 [cs.LG])
    We study diverse skill discovery in reward-free environments, aiming to discover all possible skills in simple grid-world environments where prior methods have struggled to succeed. This problem is formulated as mutual training of skills using an intrinsic reward and a discriminator trained to predict a skill given its trajectory. Our initial solution replaces the standard one-vs-all (softmax) discriminator with a one-vs-one (all pairs) discriminator and combines it with a novel intrinsic reward function and a dropout regularization technique. The combined approach is named APART: Diverse Skill Discovery using All Pairs with Ascending Reward and Dropout. We demonstrate that APART discovers all the possible skills in grid worlds with remarkably fewer samples than previous works. Motivated by the empirical success of APART, we further investigate an even simpler algorithm that achieves maximum skills by altering VIC, rescaling its intrinsic reward, and tuning the temperature of its softmax discriminator. We believe our findings shed light on the crucial factors underlying success of skill discovery algorithms in reinforcement learning.
    Job Shop Scheduling Benchmark: Environments and Instances for Learning and Non-learning Methods. (arXiv:2308.12794v1 [cs.AI])
    We introduce an open-source GitHub repository containing comprehensive benchmarks for a wide range of machine scheduling problems, including Job Shop Scheduling (JSP), Flow Shop Scheduling (FSP), Flexible Job Shop Scheduling (FJSP), FJSP with Assembly constraints (FAJSP), FJSP with Sequence-Dependent Setup Times (FJSP-SDST), and the online FJSP (with online job arrivals). Our primary goal is to provide a centralized hub for researchers, practitioners, and enthusiasts interested in tackling machine scheduling challenges.
    Auto-weighted Bayesian Physics-Informed Neural Networks and robust estimations for multitask inverse problems in pore-scale imaging of dissolution. (arXiv:2308.12864v1 [cs.LG])
    In this article, we present a novel data assimilation strategy in pore-scale imaging and demonstrate that this makes it possible to robustly address reactive inverse problems incorporating Uncertainty Quantification (UQ). Pore-scale modeling of reactive flow offers a valuable opportunity to investigate the evolution of macro-scale properties subject to dynamic processes. Yet, they suffer from imaging limitations arising from the associated X-ray microtomography (X-ray microCT) process, which induces discrepancies in the properties estimates. Assessment of the kinetic parameters also raises challenges, as reactive coefficients are critical parameters that can cover a wide range of values. We account for these two issues and ensure reliable calibration of pore-scale modeling, based on dynamical microCT images, by integrating uncertainty quantification in the workflow. The present method is based on a multitasking formulation of reactive inverse problems combining data-driven and physics-informed techniques in calcite dissolution. This allows quantifying morphological uncertainties on the porosity field and estimating reactive parameter ranges through prescribed PDE models with a latent concentration field and dynamical microCT. The data assimilation strategy relies on sequential reinforcement incorporating successively additional PDE constraints. We guarantee robust and unbiased uncertainty quantification by straightforward adaptive weighting of Bayesian Physics-Informed Neural Networks (BPINNs), ensuring reliable micro-porosity changes during geochemical transformations. We demonstrate successful Bayesian Inference in 1D+Time and 2D+Time calcite dissolution based on synthetic microCT images with meaningful posterior distribution on the reactive parameters and dimensionless numbers.
    Minimum intrinsic dimension scaling for entropic optimal transport. (arXiv:2306.03398v2 [math.ST] UPDATED)
    Motivated by the manifold hypothesis, which states that data with a high extrinsic dimension may yet have a low intrinsic dimension, we develop refined statistical bounds for entropic optimal transport that are sensitive to the intrinsic dimension of the data. Our bounds involve a robust notion of intrinsic dimension, measured at only a single distance scale depending on the regularization parameter, and show that it is only the minimum of these single-scale intrinsic dimensions which governs the rate of convergence. We call this the Minimum Intrinsic Dimension scaling (MID scaling) phenomenon, and establish MID scaling with no assumptions on the data distributions so long as the cost is bounded and Lipschitz, and for various entropic optimal transport quantities beyond just values, with stronger analogs when one distribution is supported on a manifold. Our results significantly advance the theoretical state of the art by showing that MID scaling is a generic phenomenon, and provide the first rigorous interpretation of the statistical effect of entropic regularization as a distance scale.
    BadVFL: Backdoor Attacks in Vertical Federated Learning. (arXiv:2304.08847v2 [cs.LG] UPDATED)
    Federated learning (FL) enables multiple parties to collaboratively train a machine learning model without sharing their data; rather, they train their own model locally and send updates to a central server for aggregation. Depending on how the data is distributed among the participants, FL can be classified into Horizontal (HFL) and Vertical (VFL). In VFL, the participants share the same set of training instances but only host a different and non-overlapping subset of the whole feature space. Whereas in HFL, each participant shares the same set of features while the training set is split into locally owned training data subsets. VFL is increasingly used in applications like financial fraud detection; nonetheless, very little work has analyzed its security. In this paper, we focus on robustness in VFL, in particular, on backdoor attacks, whereby an adversary attempts to manipulate the aggregate model during the training process to trigger misclassifications. Performing backdoor attacks in VFL is more challenging than in HFL, as the adversary i) does not have access to the labels during training and ii) cannot change the labels as she only has access to the feature embeddings. We present a first-of-its-kind clean-label backdoor attack in VFL, which consists of two phases: a label inference and a backdoor phase. We demonstrate the effectiveness of the attack on three different datasets, investigate the factors involved in its success, and discuss countermeasures to mitigate its impact.
    A Survey on Blood Pressure Measurement Technologies: Addressing Potential Sources of Bias. (arXiv:2306.08451v2 [physics.med-ph] UPDATED)
    Regular blood pressure (BP) monitoring in clinical and ambulatory settings plays a crucial role in the prevention, diagnosis, treatment, and management of cardiovascular diseases. Recently, the widespread adoption of ambulatory BP measurement devices has been driven predominantly by the increased prevalence of hypertension and its associated risks and clinical conditions. Recent guidelines advocate for regular BP monitoring as part of regular clinical visits or even at home. This increased utilization of BP measurement technologies has brought up significant concerns, regarding the accuracy of reported BP values across settings. In this survey, focusing mainly on cuff-based BP monitoring technologies, we highlight how BP measurements can demonstrate substantial biases and variances due to factors such as measurement and device errors, demographics, and body habitus. With these inherent biases, the development of a new generation of cuff-based BP devices which use artificial-intelligence (AI) has significant potential. We present future avenues where AI-assisted technologies can leverage the extensive clinical literature on BP-related studies together with the large collections of BP records available in electronic health records. These resources can be combined with machine learning approaches, including deep learning and Bayesian inference, to remove BP measurement biases and to provide individualized BP-related cardiovascular risk indexes.
    BagPipe: Accelerating Deep Recommendation Model Training. (arXiv:2202.12429v3 [cs.DC] UPDATED)
    Deep learning based recommendation models (DLRM) are widely used in several business critical applications. Training such recommendation models efficiently is challenging because they contain billions of embedding-based parameters, leading to significant overheads from embedding access. By profiling existing systems for DLRM training, we observe that around 75\% of the iteration time is spent on embedding access and model synchronization. Our key insight in this paper is that embedding access has a specific structure which can be used to accelerate training. We observe that embedding accesses are heavily skewed, with around 1\% of embeddings representing more than 92\% of total accesses. Further, we observe that during offline training we can lookahead at future batches to determine exactly which embeddings will be needed at what iteration in the future. Based on these insights, we develop Bagpipe, a system for training deep recommendation models that uses caching and prefetching to overlap remote embedding accesses with the computation. We design an Oracle Cacher, a new component that uses a lookahead algorithm to generate optimal cache update decisions while providing strong consistency guarantees against staleness. We also design a logically replicated, physically partitioned cache and show that our design can reduce synchronization overheads in a distributed setting. Finally, we propose a disaggregated system architecture and show that our design can enable low-overhead fault tolerance. Our experiments using three datasets and four models show that Bagpipe provides a speed up of up to 5.6x compared to state of the art baselines, while providing the same convergence and reproducibility guarantees as synchronous training.
    Human Comprehensible Active Learning of Genome-Scale Metabolic Networks. (arXiv:2308.12740v1 [cs.AI])
    An important application of Synthetic Biology is the engineering of the host cell system to yield useful products. However, an increase in the scale of the host system leads to huge design space and requires a large number of validation trials with high experimental costs. A comprehensible machine learning approach that efficiently explores the hypothesis space and guides experimental design is urgently needed for the Design-Build-Test-Learn (DBTL) cycle of the host cell system. We introduce a novel machine learning framework ILP-iML1515 based on Inductive Logic Programming (ILP) that performs abductive logical reasoning and actively learns from training examples. In contrast to numerical models, ILP-iML1515 is built on comprehensible logical representations of a genome-scale metabolic model and can update the model by learning new logical structures from auxotrophic mutant trials. The ILP-iML1515 framework 1) allows high-throughput simulations and 2) actively selects experiments that reduce the experimental cost of learning gene functions in comparison to randomly selected experiments.
    Motion In-Betweening with Phase Manifolds. (arXiv:2308.12751v1 [cs.GR])
    This paper introduces a novel data-driven motion in-betweening system to reach target poses of characters by making use of phases variables learned by a Periodic Autoencoder. Our approach utilizes a mixture-of-experts neural network model, in which the phases cluster movements in both space and time with different expert weights. Each generated set of weights then produces a sequence of poses in an autoregressive manner between the current and target state of the character. In addition, to satisfy poses which are manually modified by the animators or where certain end effectors serve as constraints to be reached by the animation, a learned bi-directional control scheme is implemented to satisfy such constraints. The results demonstrate that using phases for motion in-betweening tasks sharpen the interpolated movements, and furthermore stabilizes the learning process. Moreover, using phases for motion in-betweening tasks can also synthesize more challenging movements beyond locomotion behaviors. Additionally, style control is enabled between given target keyframes. Our proposed framework can compete with popular state-of-the-art methods for motion in-betweening in terms of motion quality and generalization, especially in the existence of long transition durations. Our framework contributes to faster prototyping workflows for creating animated character sequences, which is of enormous interest for the game and film industry.
    Near Optimal Adversarial Attack on UCB Bandits. (arXiv:2008.09312v6 [cs.LG] UPDATED)
    I study a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. I propose a novel attack strategy that manipulates a learner employing the UCB algorithm into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\widehat{O}(\sqrt{\log T})$, where $T$ is the number of rounds. I also prove the first lower bound on the cumulative attack cost. The lower bound matches the upper bound up to $O(\log \log T)$ factors, showing the proposed attack strategy to be near optimal.
    Prediction without Preclusion: Recourse Verification with Reachable Sets. (arXiv:2308.12820v1 [cs.LG])
    Machine learning models are often used to decide who will receive a loan, a job interview, or a public benefit. Standard techniques to build these models use features about people but overlook their actionability. In turn, models can assign predictions that are fixed, meaning that consumers who are denied loans, interviews, or benefits may be permanently locked out from access to credit, employment, or assistance. In this work, we introduce a formal testing procedure to flag models that assign fixed predictions that we call recourse verification. We develop machinery to reliably determine if a given model can provide recourse to its decision subjects from a set of user-specified actionability constraints. We demonstrate how our tools can ensure recourse and adversarial robustness in real-world datasets and use them to study the infeasibility of recourse in real-world lending datasets. Our results highlight how models can inadvertently assign fixed predictions that permanently bar access, and we provide tools to design algorithms that account for actionability when developing models.
    Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection. (arXiv:2308.12885v1 [cs.LG])
    The rapid entry of machine learning approaches in our daily activities and high-stakes domains demands transparency and scrutiny of their fairness and reliability. To help gauge machine learning models' robustness, research typically focuses on the massive datasets used for their deployment, e.g., creating and maintaining documentation for understanding their origin, process of development, and ethical considerations. However, data collection for AI is still typically a one-off practice, and oftentimes datasets collected for a certain purpose or application are reused for a different problem. Additionally, dataset annotations may not be representative over time, contain ambiguous or erroneous annotations, or be unable to generalize across issues or domains. Recent research has shown these practices might lead to unfair, biased, or inaccurate outcomes. We argue that data collection for AI should be performed in a responsible manner where the quality of the data is thoroughly scrutinized and measured through a systematic set of appropriate metrics. In this paper, we propose a Responsible AI (RAI) methodology designed to guide the data collection with a set of metrics for an iterative in-depth analysis of the factors influencing the quality and reliability} of the generated data. We propose a granular set of measurements to inform on the internal reliability of a dataset and its external stability over time. We validate our approach across nine existing datasets and annotation tasks and four content modalities. This approach impacts the assessment of data robustness used for AI applied in the real world, where diversity of users and content is eminent. Furthermore, it deals with fairness and accountability aspects in data collection by providing systematic and transparent quality analysis for data collections.
    The Polynomial Method is Universal for Distribution-Free Correlational SQ Learning. (arXiv:2010.11925v3 [cs.DS] UPDATED)
    We consider the problem of distribution-free learning for Boolean function classes in the PAC and agnostic models. Generalizing a beautiful work of Malach and Shalev-Shwartz (2022) that gave tight correlational SQ (CSQ) lower bounds for learning DNF formulas, we give new proofs that lower bounds on the threshold or approximate degree of any function class directly imply CSQ lower bounds for PAC or agnostic learning respectively. While such bounds implicitly follow by combining prior results by Feldman (2008, 2012) and Sherstov (2008, 2011), to our knowledge the precise statements we give had not appeared in this form before. Moreover, our proofs are simple and largely self-contained. These lower bounds match corresponding positive results using upper bounds on the threshold or approximate degree in the SQ model for PAC or agnostic learning, and in this sense these results show that the polynomial method is a universal, best-possible approach for distribution-free CSQ learning.
    Uncertainty and Explainable Analysis of Machine Learning Model for Reconstruction of Sonic Slowness Logs. (arXiv:2308.12625v1 [cs.LG])
    Logs are valuable information for oil and gas fields as they help to determine the lithology of the formations surrounding the borehole and the location and reserves of subsurface oil and gas reservoirs. However, important logs are often missing in horizontal or old wells, which poses a challenge in field applications. In this paper, we utilize data from the 2020 machine learning competition of the SPWLA, which aims to predict the missing compressional wave slowness and shear wave slowness logs using other logs in the same borehole. We employ the NGBoost algorithm to construct an Ensemble Learning model that can predicate the results as well as their uncertainty. Furthermore, we combine the SHAP method to investigate the interpretability of the machine learning model. We compare the performance of the NGBosst model with four other commonly used Ensemble Learning methods, including Random Forest, GBDT, XGBoost, LightGBM. The results show that the NGBoost model performs well in the testing set and can provide a probability distribution for the prediction results. In addition, the variance of the probability distribution of the predicted log can be used to justify the quality of the constructed log. Using the SHAP explainable machine learning model, we calculate the importance of each input log to the predicted results as well as the coupling relationship among input logs. Our findings reveal that the NGBoost model tends to provide greater slowness prediction results when the neutron porosity and gamma ray are large, which is consistent with the cognition of petrophysical models. Furthermore, the machine learning model can capture the influence of the changing borehole caliper on slowness, where the influence of borehole caliper on slowness is complex and not easy to establish a direct relationship. These findings are in line with the physical principle of borehole acoustics.
    Universal Soldier: Using Universal Adversarial Perturbations for Detecting Backdoor Attacks. (arXiv:2302.00747v3 [cs.LG] UPDATED)
    Deep learning models achieve excellent performance in numerous machine learning tasks. Yet, they suffer from security-related issues such as adversarial examples and poisoning (backdoor) attacks. A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters. Then, a backdoored model performs as expected when receiving a clean input, but it misclassifies when receiving a backdoored input stamped with a pre-designed pattern called "trigger". Unfortunately, it is difficult to distinguish between clean and backdoored models without prior knowledge of the trigger. This paper proposes a backdoor detection method by utilizing a special type of adversarial attack, universal adversarial perturbation (UAP), and its similarities with a backdoor trigger. We observe an intuitive phenomenon: UAPs generated from backdoored models need fewer perturbations to mislead the model than UAPs from clean models. UAPs of backdoored models tend to exploit the shortcut from all classes to the target class, built by the backdoor trigger. We propose a novel method called Universal Soldier for Backdoor detection (USB) and reverse engineering potential backdoor triggers via UAPs. Experiments on 345 models trained on several datasets show that USB effectively detects the injected backdoor and provides comparable or better results than state-of-the-art methods.
    Inverse Lithography Physics-informed Deep Neural Level Set for Mask Optimization. (arXiv:2308.12299v1 [eess.IV])
    As the feature size of integrated circuits continues to decrease, optical proximity correction (OPC) has emerged as a crucial resolution enhancement technology for ensuring high printability in the lithography process. Recently, level set-based inverse lithography technology (ILT) has drawn considerable attention as a promising OPC solution, showcasing its powerful pattern fidelity, especially in advanced process. However, massive computational time consumption of ILT limits its applicability to mainly correcting partial layers and hotspot regions. Deep learning (DL) methods have shown great potential in accelerating ILT. However, lack of domain knowledge of inverse lithography limits the ability of DL-based algorithms in process window (PW) enhancement and etc. In this paper, we propose an inverse lithography physics-informed deep neural level set (ILDLS) approach for mask optimization. This approach utilizes level set based-ILT as a layer within the DL framework and iteratively conducts mask prediction and correction to significantly enhance printability and PW in comparison with results from pure DL and ILT. With this approach, computation time is reduced by a few orders of magnitude versus ILT. By gearing up DL with knowledge of inverse lithography physics, ILDLS provides a new and efficient mask optimization solution.
    ICU Mortality Prediction Using Long Short-Term Memory Networks. (arXiv:2308.12800v1 [cs.LG])
    Extensive bedside monitoring in Intensive Care Units (ICUs) has resulted in complex temporal data regarding patient physiology, which presents an upscale context for clinical data analysis. In the other hand, identifying the time-series patterns within these data may provide a high aptitude to predict clinical events. Hence, we investigate, during this work, the implementation of an automatic data-driven system, which analyzes large amounts of multivariate temporal data derived from Electronic Health Records (EHRs), and extracts high-level information so as to predict in-hospital mortality and Length of Stay (LOS) early. Practically, we investigate the applicability of LSTM network by reducing the time-frame to 6-hour so as to enhance clinical tasks. The experimental results highlight the efficiency of LSTM model with rigorous multivariate time-series measurements for building real-world prediction engines.
    Fast Exact NPN Classification with Influence-aided Canonical Form. (arXiv:2308.12311v1 [cs.LG])
    NPN classification has many applications in the synthesis and verification of digital circuits. The canonical-form-based method is the most common approach, designing a canonical form as representative for the NPN equivalence class first and then computing the transformation function according to the canonical form. Most works use variable symmetries and several signatures, mainly based on the cofactor, to simplify the canonical form construction and computation. This paper describes a novel canonical form and its computation algorithm by introducing Boolean influence to NPN classification, which is a basic concept in analysis of Boolean functions. We show that influence is input-negation-independent, input-permutation-dependent, and has other structural information than previous signatures for NPN classification. Therefore, it is a significant ingredient in speeding up NPN classification. Experimental results prove that influence plays an important role in reducing the transformation enumeration in computing the canonical form. Compared with the state-of-the-art algorithm implemented in ABC, our influence-aided canonical form for exact NPN classification gains up to 5.5x speedup.
    A Greedy Approach for Offering to Telecom Subscribers. (arXiv:2308.12606v1 [stat.ML])
    Customer retention or churn prevention is a challenging task of a telecom operator. One of the effective approaches is to offer some attractive incentive or additional services or money to the subscribers for keeping them engaged and make sure they stay in the operator's network for longer time. Often, operators allocate certain amount of monetary budget to carry out the offer campaign. The difficult part of this campaign is the selection of a set of customers from a large subscriber-base and deciding the amount that should be offered to an individual so that operator's objective is achieved. There may be multiple objectives (e.g., maximizing revenue, minimizing number of churns) for selection of subscriber and selection of an offer to the selected subscriber. Apart from monetary benefit, offers may include additional data, SMS, hots-spot tethering, and many more. This problem is known as offer optimization. In this paper, we propose a novel combinatorial algorithm for solving offer optimization under heterogeneous offers by maximizing expected revenue under the scenario of subscriber churn, which is, in general, seen in telecom domain. The proposed algorithm is efficient and accurate even for a very large subscriber-base.
    Interneurons accelerate learning dynamics in recurrent neural networks for statistical adaptation. (arXiv:2209.10634v2 [q-bio.NC] UPDATED)
    Early sensory systems in the brain rapidly adapt to fluctuating input statistics, which requires recurrent communication between neurons. Mechanistically, such recurrent communication is often indirect and mediated by local interneurons. In this work, we explore the computational benefits of mediating recurrent communication via interneurons compared with direct recurrent connections. To this end, we consider two mathematically tractable recurrent linear neural networks that statistically whiten their inputs -- one with direct recurrent connections and the other with interneurons that mediate recurrent communication. By analyzing the corresponding continuous synaptic dynamics and numerically simulating the networks, we show that the network with interneurons is more robust to initialization than the network with direct recurrent connections in the sense that the convergence time for the synaptic dynamics in the network with interneurons (resp. direct recurrent connections) scales logarithmically (resp. linearly) with the spectrum of their initialization. Our results suggest that interneurons are computationally useful for rapid adaptation to changing input statistics. Interestingly, the network with interneurons is an overparameterized solution of the whitening objective for the network with direct recurrent connections, so our results can be viewed as a recurrent linear neural network analogue of the implicit acceleration phenomenon observed in overparameterized feedforward linear neural networks.
    Dense Text-to-Image Generation with Attention Modulation. (arXiv:2308.12964v1 [cs.CV])
    Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout. We first analyze the relationship between generated images' layouts and the pre-trained model's intermediate attention maps. Next, we develop an attention modulation method that guides objects to appear in specific regions according to layout guidance. Without requiring additional fine-tuning or datasets, we improve image generation performance given dense captions regarding both automatic and human evaluation scores. In addition, we achieve similar-quality visual results with models specifically trained with layout conditions.
    NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes. (arXiv:2308.12967v1 [cs.CV])
    Recent implicit neural representations have shown great results for novel view synthesis. However, existing methods require expensive per-scene optimization from many views hence limiting their application to real-world unbounded urban settings where the objects of interest or backgrounds are observed from very few views. To mitigate this challenge, we introduce a new approach called NeO 360, Neural fields for sparse view synthesis of outdoor scenes. NeO 360 is a generalizable method that reconstructs 360{\deg} scenes from a single or a few posed RGB images. The essence of our approach is in capturing the distribution of complex real-world outdoor 3D scenes and using a hybrid image-conditional triplanar representation that can be queried from any world point. Our representation combines the best of both voxel-based and bird's-eye-view (BEV) representations and is more effective and expressive than each. NeO 360's representation allows us to learn from a large collection of unbounded 3D scenes while offering generalizability to new views and novel scenes from as few as a single image during inference. We demonstrate our approach on the proposed challenging 360{\deg} unbounded dataset, called NeRDS 360, and show that NeO 360 outperforms state-of-the-art generalizable methods for novel view synthesis while also offering editing and composition capabilities. Project page: https://zubair-irshad.github.io/projects/neo360.html
    StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random. (arXiv:2205.04701v3 [cs.LG] UPDATED)
    In recommender systems, users always choose the favorite items to rate, which leads to data missing not at random and poses a great challenge for unbiased evaluation and learning of prediction models. Currently, the doubly robust (DR) methods have been widely studied and demonstrate superior performance. However, in this paper, we show that DR methods are unstable and have unbounded bias, variance, and generalization bounds to extremely small propensities. Moreover, the fact that DR relies more on extrapolation will lead to suboptimal performance. To address the above limitations while retaining double robustness, we propose a stabilized doubly robust (StableDR) learning approach with a weaker reliance on extrapolation. Theoretical analysis shows that StableDR has bounded bias, variance, and generalization error bound simultaneously under inaccurate imputed errors and arbitrarily small propensities. In addition, we propose a novel learning approach for StableDR that updates the imputation, propensity, and prediction models cyclically, achieving more stable and accurate predictions. Extensive experiments show that our approaches significantly outperform the existing methods.
    Dealing with Small Datasets for Deep Learning in Medical Imaging: An Evaluation of Self-Supervised Pre-Training on CT Scans Comparing Contrastive and Masked Autoencoder Methods for Convolutional Models. (arXiv:2308.06534v2 [cs.CV] UPDATED)
    Deep learning in medical imaging has the potential to minimize the risk of diagnostic errors, reduce radiologist workload, and accelerate diagnosis. Training such deep learning models requires large and accurate datasets, with annotations for all training samples. However, in the medical imaging domain, annotated datasets for specific tasks are often small due to the high complexity of annotations, limited access, or the rarity of diseases. To address this challenge, deep learning models can be pre-trained on large image datasets without annotations using methods from the field of self-supervised learning. After pre-training, small annotated datasets are sufficient to fine-tune the models for a specific task. The most popular self-supervised pre-training approaches in medical imaging are based on contrastive learning. However, recent studies in natural image processing indicate a strong potential for masked autoencoder approaches. Our work compares state-of-the-art contrastive learning methods with the recently introduced masked autoencoder approach "SparK" for convolutional neural networks (CNNs) on medical images. Therefore we pre-train on a large unannotated CT image dataset and fine-tune on several CT classification tasks. Due to the challenge of obtaining sufficient annotated training data in medical imaging, it is of particular interest to evaluate how the self-supervised pre-training methods perform when fine-tuning on small datasets. By experimenting with gradually reducing the training dataset size for fine-tuning, we find that the reduction has different effects depending on the type of pre-training chosen. The SparK pre-training method is more robust to the training dataset size than the contrastive methods. Based on our results, we propose the SparK pre-training for medical imaging tasks with only small annotated datasets.
    Integer Factorisation, Fermat & Machine Learning on a Classical Computer. (arXiv:2308.12290v1 [cs.LG])
    In this paper we describe a deep learning--based probabilistic algorithm for integer factorisation. We use Lawrence's extension of Fermat's factorisation algorithm to reduce the integer factorisation problem to a binary classification problem. To address the classification problem, based on the ease of generating large pseudo--random primes, a corpus of training data, as large as needed, is synthetically generated. We will introduce the algorithm, summarise some experiments, analyse where these experiments fall short, and finally put out a call to others to reproduce, verify and see if this approach can be improved to a point where it becomes a practical, scalable factorisation algorithm.
    Easy attention: A simple self-attention mechanism for Transformers. (arXiv:2308.12874v1 [cs.LG])
    To improve the robustness of transformer neural networks used for temporal-dynamics prediction of chaotic systems, we propose a novel attention mechanism called easy attention. Due to the fact that self attention only makes usage of the inner product of queries and keys, it is demonstrated that the keys, queries and softmax are not necessary for obtaining the attention score required to capture long-term dependencies in temporal sequences. Through implementing singular-value decomposition (SVD) on the softmax attention score, we further observe that the self attention compresses contribution from both queries and keys in the spanned space of the attention score. Therefore, our proposed easy-attention method directly treats the attention scores as learnable parameters. This approach produces excellent results when reconstructing and predicting the temporal dynamics of chaotic systems exhibiting more robustness and less complexity than the self attention or the widely-used long short-term memory (LSTM) network. Our results show great potential for applications in more complex high-dimensional dynamical systems.
    Deep Reinforcement Learning-driven Cross-Community Energy Interaction Optimal Scheduling. (arXiv:2308.12554v1 [eess.SY])
    In order to coordinate energy interactions among various communities and energy conversions among multi-energy subsystems within the multi-community integrated energy system under uncertain conditions, and achieve overall optimization and scheduling of the comprehensive energy system, this paper proposes a comprehensive scheduling model that utilizes a multi-agent deep reinforcement learning algorithm to learn load characteristics of different communities and make decisions based on this knowledge. In this model, the scheduling problem of the integrated energy system is transformed into a Markov decision process and solved using a data-driven deep reinforcement learning algorithm, which avoids the need for modeling complex energy coupling relationships between multi-communities and multi-energy subsystems. The simulation results show that the proposed method effectively captures the load characteristics of different communities and utilizes their complementary features to coordinate reasonable energy interactions among them. This leads to a reduction in wind curtailment rate from 16.3% to 0% and lowers the overall operating cost by 5445.6 Yuan, demonstrating significant economic and environmental benefits.
    LANISTR: Multimodal Learning from Structured and Unstructured Data. (arXiv:2305.16556v2 [cs.LG] UPDATED)
    Multimodal large-scale pretraining has shown impressive performance for unstructured data including language, image, audio, and video. However, a prevalent real-world scenario involves the combination of structured data types (tabular, time-series) with unstructured data which has so far been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data. The core of LANISTR's methodology is rooted in \textit{masking-based} training applied across both unimodal and multimodal levels. In particular, we introduce a new similarity-based multimodal masking loss that enables it to learn cross-modal relations from large-scale multimodal data with missing modalities. On two real-world datastes, MIMIC-IV (healthcare) and Amazon Product Review (retail), LANISTR demonstrates remarkable absolute improvements of 6.6\% (AUROC) and up to 14\% (accuracy) when fine-tuned on 0.1\% and 0.01\% of labeled data, respectively, compared to the state-of-the-art alternatives. Notably, these improvements are observed even in the presence of considerable missingness ratios of 35.7\% and 99.8\%, in the respective datasets.
    Transforming to Yoked Neural Networks to Improve ANN Structure. (arXiv:2306.02157v3 [cs.LG] UPDATED)
    Most existing classical artificial neural networks (ANN) are designed as a tree structure to imitate neural networks. In this paper, we argue that the connectivity of a tree is not sufficient to characterize a neural network. The nodes of the same level of a tree cannot be connected with each other, i.e., these neural unit cannot share information with each other, which is a major drawback of ANN. Although ANN has been significantly improved in recent years to more complex structures, such as the directed acyclic graph (DAG), these methods also have unidirectional and acyclic bias for ANN. In this paper, we propose a method to build a bidirectional complete graph for the nodes in the same level of an ANN, which yokes the nodes of the same level to formulate a neural module. We call our model as YNN in short. YNN promotes the information transfer significantly which obviously helps in improving the performance of the method. Our YNN can imitate neural networks much better compared with the traditional ANN. In this paper, we analyze the existing structural bias of ANN and propose a model YNN to efficiently eliminate such structural bias. In our model, nodes also carry out aggregation and transformation of features, and edges determine the flow of information. We further impose auxiliary sparsity constraint to the distribution of connectedness, which promotes the learned structure to focus on critical connections. Finally, based on the optimized structure, we also design small neural module structure based on the minimum cut technique to reduce the computational burden of the YNN model. This learning process is compatible with the existing networks and different tasks. The obtained quantitative experimental results reflect that the learned connectivity is superior to the traditional NN structure.
    BridgeData V2: A Dataset for Robot Learning at Scale. (arXiv:2308.12952v1 [cs.RO])
    We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains, and institutions, making the dataset a useful resource for a broad range of researchers. Additionally, the dataset is compatible with a wide variety of open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions. In our experiments, we train 6 state-of-the-art imitation learning and offline reinforcement learning methods on our dataset, and find that they succeed on a suite of tasks requiring varying amounts of generalization. We also demonstrate that the performance of these methods improves with more data and higher capacity models, and that training on a greater variety of skills leads to improved generalization. By publicly sharing BridgeData V2 and our pre-trained models, we aim to accelerate research in scalable robot learning methods. Project page at https://rail-berkeley.github.io/bridgedata
    LCANets++: Robust Audio Classification using Multi-layer Neural Networks with Lateral Competition. (arXiv:2308.12882v1 [cs.SD])
    Audio classification aims at recognizing audio signals, including speech commands or sound events. However, current audio classifiers are susceptible to perturbations and adversarial attacks. In addition, real-world audio classification tasks often suffer from limited labeled data. To help bridge these gaps, previous work developed neuro-inspired convolutional neural networks (CNNs) with sparse coding via the Locally Competitive Algorithm (LCA) in the first layer (i.e., LCANets) for computer vision. LCANets learn in a combination of supervised and unsupervised learning, reducing dependency on labeled samples. Motivated by the fact that auditory cortex is also sparse, we extend LCANets to audio recognition tasks and introduce LCANets++, which are CNNs that perform sparse coding in multiple layers via LCA. We demonstrate that LCANets++ are more robust than standard CNNs and LCANets against perturbations, e.g., background noise, as well as black-box and white-box attacks, e.g., evasion and fast gradient sign (FGSM) attacks.
    Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication. (arXiv:2306.10466v2 [cs.LG] UPDATED)
    Graphs are omnipresent and GNNs are a powerful family of neural networks for learning over graphs. Despite their popularity, scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing, which often lead to sub-standard performance. In this work, we are interested in exploring a principled way to scale GNNs capacity without deepening or widening, which can improve its performance across multiple small and large graphs. Motivated by the recent intriguing phenomenon of model soups, which suggest that fine-tuned weights of multiple large-language pre-trained models can be merged to a better minima, we argue to exploit the fundamentals of model soups to mitigate the aforementioned issues of memory bottleneck and trainability during GNNs scaling. More specifically, we propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs, i.e., to build powerful GNNs. By dividing giant graph data, we build multiple independently and parallelly trained weaker GNNs (soup ingredient) without any intermediate communication, and combine their strength using a greedy interpolation soup procedure to achieve state-of-the-art performance. Compared to concurrent distributed GNN training works such as Jiong et. al. 2023, we train each soup ingredient by sampling different subgraphs per epoch and their respective sub-models are merged only after being fully trained (rather than intermediately so). Moreover, we provide a wide variety of model soup preparation techniques by leveraging state-of-the-art graph sampling and graph partitioning approaches that can handle large graphs. Codes are available at: \url{https://github.com/VITA-Group/graph_ladling}.
    Conditional expectation using compactification operators. (arXiv:2306.10592v3 [stat.ML] UPDATED)
    The separate tasks of denoising, least squares expectation, and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactification tool, to set up the estimation problem as a linear inverse problem in a reproducing kernel Hilbert space. This equation is shown to have solutions that allow numerical approximation, thus guaranteeing the convergence of data-driven implementations. The overall technique is easy to implement, and their successful application to some real-world problems are also shown.
    A Co-training Approach for Noisy Time Series Learning. (arXiv:2308.12551v1 [cs.LG])
    In this work, we focus on robust time series representation learning. Our assumption is that real-world time series is noisy and complementary information from different views of the same time series plays an important role while analyzing noisy input. Based on this, we create two views for the input time series through two different encoders. We conduct co-training based contrastive learning iteratively to learn the encoders. Our experiments demonstrate that this co-training approach leads to a significant improvement in performance. Especially, by leveraging the complementary information from different views, our proposed TS-CoT method can mitigate the impact of data noise and corruption. Empirical evaluations on four time series benchmarks in unsupervised and semi-supervised settings reveal that TS-CoT outperforms existing methods. Furthermore, the representations learned by TS-CoT can transfer well to downstream tasks through fine-tuning.
    Try with Simpler -- An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection. (arXiv:2308.12612v1 [cs.LG])
    The rapid growth of deep learning (DL) has spurred interest in enhancing log-based anomaly detection. This approach aims to extract meaning from log events (log message templates) and develop advanced DL models for anomaly detection. However, these DL methods face challenges like heavy reliance on training data, labels, and computational resources due to model complexity. In contrast, traditional machine learning and data mining techniques are less data-dependent and more efficient but less effective than DL. To make log-based anomaly detection more practical, the goal is to enhance traditional techniques to match DL's effectiveness. Previous research in a different domain (linking questions on Stack Overflow) suggests that optimized traditional techniques can rival state-of-the-art DL methods. Drawing inspiration from this concept, we conducted an empirical study. We optimized the unsupervised PCA (Principal Component Analysis), a traditional technique, by incorporating lightweight semantic-based log representation. This addresses the issue of unseen log events in training data, enhancing log representation. Our study compared seven log-based anomaly detection methods, including four DL-based, two traditional, and the optimized PCA technique, using public and industrial datasets. Results indicate that the optimized unsupervised PCA technique achieves similar effectiveness to advanced supervised/semi-supervised DL methods while being more stable with limited training data and resource-efficient. This demonstrates the adaptability and strength of traditional techniques through small yet impactful adaptations.
    An Intentional Forgetting-Driven Self-Healing Method For Deep Reinforcement Learning Systems. (arXiv:2308.12445v1 [cs.LG])
    Deep reinforcement learning (DRL) is increasingly applied in large-scale productions like Netflix and Facebook. As with most data-driven systems, DRL systems can exhibit undesirable behaviors due to environmental drifts, which often occur in constantly-changing production settings. Continual Learning (CL) is the inherent self-healing approach for adapting the DRL agent in response to the environment's conditions shifts. However, successive shifts of considerable magnitude may cause the production environment to drift from its original state. Recent studies have shown that these environmental drifts tend to drive CL into long, or even unsuccessful, healing cycles, which arise from inefficiencies such as catastrophic forgetting, warm-starting failure, and slow convergence. In this paper, we propose Dr. DRL, an effective self-healing approach for DRL systems that integrates a novel mechanism of intentional forgetting into vanilla CL to overcome its main issues. Dr. DRL deliberately erases the DRL system's minor behaviors to systematically prioritize the adaptation of the key problem-solving skills. Using well-established DRL algorithms, Dr. DRL is compared with vanilla CL on various drifted environments. Dr. DRL is able to reduce, on average, the healing time and fine-tuning episodes by, respectively, 18.74% and 17.72%. Dr. DRL successfully helps agents to adapt to 19.63% of drifted environments left unsolved by vanilla CL while maintaining and even enhancing by up to 45% the obtained rewards for drifted environments that are resolved by both approaches.
    Convergence of the Backward Deep BSDE Method with Applications to Optimal Stopping Problems. (arXiv:2210.04118v3 [math.PR] UPDATED)
    The optimal stopping problem is one of the core problems in financial markets, with broad applications such as pricing American and Bermudan options. The deep BSDE method [Han, Jentzen and E, PNAS, 115(34):8505-8510, 2018] has shown great power in solving high-dimensional forward-backward stochastic differential equations (FBSDEs), and inspired many applications. However, the method solves backward stochastic differential equations (BSDEs) in a forward manner, which can not be used for optimal stopping problems that in general require running BSDE backwardly. To overcome this difficulty, a recent paper [Wang, Chen, Sudjianto, Liu and Shen, arXiv:1807.06622, 2018] proposed the backward deep BSDE method to solve the optimal stopping problem. In this paper, we provide the rigorous theory for the backward deep BSDE method. Specifically, 1. We derive the a posteriori error estimation, i.e., the error of the numerical solution can be bounded by the training loss function; and; 2. We give an upper bound of the loss function, which can be sufficiently small subject to universal approximations. We give two numerical examples, which present consistent performance with the proved theory.
    A Continual Learning Approach for Cross-Domain White Blood Cell Classification. (arXiv:2308.12679v1 [cs.CV])
    Accurate classification of white blood cells in peripheral blood is essential for diagnosing hematological diseases. Due to constantly evolving clinical settings, data sources, and disease classifications, it is necessary to update machine learning classification models regularly for practical real-world use. Such models significantly benefit from sequentially learning from incoming data streams without forgetting previously acquired knowledge. However, models can suffer from catastrophic forgetting, causing a drop in performance on previous tasks when fine-tuned on new data. Here, we propose a rehearsal-based continual learning approach for class incremental and domain incremental scenarios in white blood cell classification. To choose representative samples from previous tasks, we employ exemplar set selection based on the model's predictions. This involves selecting the most confident samples and the most challenging samples identified through uncertainty estimation of the model. We thoroughly evaluated our proposed approach on three white blood cell classification datasets that differ in color, resolution, and class composition, including scenarios where new domains or new classes are introduced to the model with every task. We also test a long class incremental experiment with both new domains and new classes. Our results demonstrate that our approach outperforms established baselines in continual learning, including existing iCaRL and EWC methods for classifying white blood cells in cross-domain environments.
    Geodesic Mode Connectivity. (arXiv:2308.12666v1 [cs.LG])
    Mode connectivity is a phenomenon where trained models are connected by a path of low loss. We reframe this in the context of Information Geometry, where neural networks are studied as spaces of parameterized distributions with curved geometry. We hypothesize that shortest paths in these spaces, known as geodesics, correspond to mode-connecting paths in the loss landscape. We propose an algorithm to approximate geodesics and demonstrate that they achieve mode connectivity.
    PFL-GAN: When Client Heterogeneity Meets Generative Models in Personalized Federated Learning. (arXiv:2308.12454v1 [cs.LG])
    Recent advances of generative learning models are accompanied by the growing interest in federated learning (FL) based on generative adversarial network (GAN) models. In the context of FL, GAN can capture the underlying client data structure, and regenerate samples resembling the original data distribution without compromising the private raw data. Although most existing GAN-based FL works focus on training a global model, Personalized FL (PFL) sometimes can be more effective in view of client data heterogeneity in terms of distinct data sample distributions, feature spaces, and labels. To cope with client heterogeneity in GAN-based FL, we propose a novel GAN sharing and aggregation strategy for PFL. The proposed PFL-GAN addresses the client heterogeneity in different scenarios. More specially, we first learn the similarity among clients and then develop an weighted collaborative data aggregation. The empirical results through the rigorous experimentation on several well-known datasets demonstrate the effectiveness of PFL-GAN.
    Variational Information Pursuit with Large Language and Multimodal Models for Interpretable Predictions. (arXiv:2308.12562v1 [cs.LG])
    Variational Information Pursuit (V-IP) is a framework for making interpretable predictions by design by sequentially selecting a short chain of task-relevant, user-defined and interpretable queries about the data that are most informative for the task. While this allows for built-in interpretability in predictive models, applying V-IP to any task requires data samples with dense concept-labeling by domain experts, limiting the application of V-IP to small-scale tasks where manual data annotation is feasible. In this work, we extend the V-IP framework with Foundational Models (FMs) to address this limitation. More specifically, we use a two-step process, by first leveraging Large Language Models (LLMs) to generate a sufficiently large candidate set of task-relevant interpretable concepts, then using Large Multimodal Models to annotate each data sample by semantic similarity with each concept in the generated concept set. While other interpretable-by-design frameworks such as Concept Bottleneck Models (CBMs) require an additional step of removing repetitive and non-discriminative concepts to have good interpretability and test performance, we mathematically and empirically justify that, with a sufficiently informative and task-relevant query (concept) set, the proposed FM+V-IP method does not require any type of concept filtering. In addition, we show that FM+V-IP with LLM generated concepts can achieve better test performance than V-IP with human annotated concepts, demonstrating the effectiveness of LLMs at generating efficient query sets. Finally, when compared to other interpretable-by-design frameworks such as CBMs, FM+V-IP can achieve competitive test performance using fewer number of concepts/queries in both cases with filtered or unfiltered concept sets.
    MoCLIM: Towards Accurate Cancer Subtyping via Multi-Omics Contrastive Learning with Omics-Inference Modeling. (arXiv:2308.09725v2 [q-bio.GN] UPDATED)
    Precision medicine fundamentally aims to establish causality between dysregulated biochemical mechanisms and cancer subtypes. Omics-based cancer subtyping has emerged as a revolutionary approach, as different level of omics records the biochemical products of multistep processes in cancers. This paper focuses on fully exploiting the potential of multi-omics data to improve cancer subtyping outcomes, and hence developed MoCLIM, a representation learning framework. MoCLIM independently extracts the informative features from distinct omics modalities. Using a unified representation informed by contrastive learning of different omics modalities, we can well-cluster the subtypes, given cancer, into a lower latent space. This contrast can be interpreted as a projection of inter-omics inference observed in biological networks. Experimental results on six cancer datasets demonstrate that our approach significantly improves data fit and subtyping performance in fewer high-dimensional cancer instances. Moreover, our framework incorporates various medical evaluations as the final component, providing high interpretability in medical analysis.
    Expectation-Complete Graph Representations with Homomorphisms. (arXiv:2306.05838v2 [cs.LG] UPDATED)
    We investigate novel random graph embeddings that can be computed in expected polynomial time and that are able to distinguish all non-isomorphic graphs in expectation. Previous graph embeddings have limited expressiveness and either cannot distinguish all graphs or cannot be computed efficiently for every graph. To be able to approximate arbitrary functions on graphs, we are interested in efficient alternatives that become arbitrarily expressive with increasing resources. Our approach is based on Lov\'asz' characterisation of graph isomorphism through an infinite dimensional vector of homomorphism counts. Our empirical evaluation shows competitive results on several benchmark graph learning tasks.
    FedSoL: Bridging Global Alignment and Local Generality in Federated Learning. (arXiv:2308.12532v1 [cs.LG])
    Federated Learning (FL) aggregates locally trained models from individual clients to construct a global model. While FL enables learning a model with data privacy, it often suffers from significant performance degradation when client data distributions are heterogeneous. Many previous FL algorithms have addressed this issue by introducing various proximal restrictions. These restrictions aim to encourage global alignment by constraining the deviation of local learning from the global objective. However, they inherently limit local learning by interfering with the original local objectives. Recently, an alternative approach has emerged to improve local learning generality. By obtaining local models within a smooth loss landscape, this approach mitigates conflicts among different local objectives of the clients. Yet, it does not ensure stable global alignment, as local learning does not take the global objective into account. In this study, we propose Federated Stability on Learning (FedSoL), which combines both the concepts of global alignment and local generality. In FedSoL, the local learning seeks a parameter region robust against proximal perturbations. This strategy introduces an implicit proximal restriction effect in local learning while maintaining the original local objective for parameter update. Our experiments show that FedSoL consistently achieves state-of-the-art performance on various setups.
    Unified Data Management and Comprehensive Performance Evaluation for Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark]. (arXiv:2308.12899v1 [cs.LG])
    The field of urban spatial-temporal prediction is advancing rapidly with the development of deep learning techniques and the availability of large-scale datasets. However, challenges persist in accessing and utilizing diverse urban spatial-temporal datasets from different sources and stored in different formats, as well as determining effective model structures and components with the proliferation of deep learning models. This work addresses these challenges and provides three significant contributions. Firstly, we introduce "atomic files", a unified storage format designed for urban spatial-temporal big data, and validate its effectiveness on 40 diverse datasets, simplifying data management. Secondly, we present a comprehensive overview of technological advances in urban spatial-temporal prediction models, guiding the development of robust models. Thirdly, we conduct extensive experiments using diverse models and datasets, establishing a performance leaderboard and identifying promising research directions. Overall, this work effectively manages urban spatial-temporal data, guides future efforts, and facilitates the development of accurate and efficient urban spatial-temporal prediction models. It can potentially make long-term contributions to urban spatial-temporal data management and prediction, ultimately leading to improved urban living standards.
    Disentanglement Learning via Topology. (arXiv:2308.12696v1 [cs.LG])
    We propose TopDis (Topological Disentanglement), a method for learning disentangled representations via adding multi-scale topological loss term. Disentanglement is a crucial property of data representations substantial for the explainability and robustness of deep learning models and a step towards high-level cognition. The state-of-the-art method based on VAE minimizes the total correlation of the joint distribution of latent variables. We take a different perspective on disentanglement by analyzing topological properties of data manifolds. In particular, we optimize the topological similarity for data manifolds traversals. To the best of our knowledge, our paper is the first one to propose a differentiable topological loss for disentanglement. Our experiments have shown that the proposed topological loss improves disentanglement scores such as MIG, FactorVAE score, SAP score and DCI disentanglement score with respect to state-of-the-art results. Our method works in an unsupervised manner, permitting to apply it for problems without labeled factors of variation. Additionally, we show how to use the proposed topological loss to find disentangled directions in a trained GAN.
    Feature Unlearning for Pre-trained GANs and VAEs. (arXiv:2303.05699v2 [cs.CV] UPDATED)
    We tackle the problem of feature unlearning from a pre-trained image generative model: GANs and VAEs. Unlike a common unlearning task where an unlearning target is a subset of the training set, we aim to unlearn a specific feature, such as hairstyle from facial images, from the pre-trained generative models. As the target feature is only presented in a local region of an image, unlearning the entire image from the pre-trained model may result in losing other details in the remaining region of the image. To specify which features to unlearn, we collect randomly generated images that contain the target features. We then identify a latent representation corresponding to the target feature and then use the representation to fine-tune the pre-trained model. Through experiments on MNIST and CelebA datasets, we show that target features are successfully removed while keeping the fidelity of the original models. Further experiments with an adversarial attack show that the unlearned model is more robust under the presence of malicious parties.
    Multivariate Time-Series Anomaly Detection with Contaminated Data: Application to Physiological Signals. (arXiv:2308.12563v1 [cs.LG])
    Mainstream unsupervised anomaly detection algorithms often excel in academic datasets, yet their real-world performance is restricted due to the controlled experimental conditions involving clean training data. Addressing the challenge of training with noise, a prevalent issue in practical anomaly detection, is frequently overlooked. In a pioneering endeavor, this study delves into the realm of label-level noise within sensory time-series anomaly detection (TSAD). This paper presents a novel and practical end-to-end unsupervised TSAD when the training data are contaminated with anomalies. The introduced approach, called TSAD-C, is devoid of access to abnormality labels during the training phase. TSAD-C encompasses three modules: a Decontaminator to rectify the abnormalities (aka noise) present in the training data, a Variable Dependency Modeling module to capture both long-term intra- and inter-variable dependencies within the decontaminated data that can be considered as a surrogate of the pure normal data, and an Anomaly Scoring module to detect anomalies. Our extensive experiments conducted on three widely used physiological datasets conclusively demonstrate that our approach surpasses existing methodologies, thus establishing a new state-of-the-art performance in the field.
    Evaluating the Vulnerabilities in ML systems in terms of adversarial attacks. (arXiv:2308.12918v1 [cs.LG])
    There have been recent adversarial attacks that are difficult to find. These new adversarial attacks methods may pose challenges to current deep learning cyber defense systems and could influence the future defense of cyberattacks. The authors focus on this domain in this research paper. They explore the consequences of vulnerabilities in AI systems. This includes discussing how they might arise, differences between randomized and adversarial examples and also potential ethical implications of vulnerabilities. Moreover, it is important to train the AI systems appropriately when they are in testing phase and getting them ready for broader use.
    Actuator Trajectory Planning for UAVs with Overhead Manipulator using Reinforcement Learning. (arXiv:2308.12843v1 [cs.RO])
    In this paper, we investigate the operation of an aerial manipulator system, namely an Unmanned Aerial Vehicle (UAV) equipped with a controllable arm with two degrees of freedom to carry out actuation tasks on the fly. Our solution is based on employing a Q-learning method to control the trajectory of the tip of the arm, also called \textit{end-effector}. More specifically, we develop a motion planning model based on Time To Collision (TTC), which enables a quadrotor UAV to navigate around obstacles while ensuring the manipulator's reachability. Additionally, we utilize a model-based Q-learning model to independently track and control the desired trajectory of the manipulator's end-effector, given an arbitrary baseline trajectory for the UAV platform. Such a combination enables a variety of actuation tasks such as high-altitude welding, structural monitoring and repair, battery replacement, gutter cleaning, sky scrapper cleaning, and power line maintenance in hard-to-reach and risky environments while retaining compatibility with flight control firmware. Our RL-based control mechanism results in a robust control strategy that can handle uncertainties in the motion of the UAV, offering promising performance. Specifically, our method achieves 92\% accuracy in terms of average displacement error (i.e. the mean distance between the target and obtained trajectory points) using Q-learning with 15,000 episodes
    Breaking the Communication-Privacy-Accuracy Tradeoff with $f$-Differential Privacy. (arXiv:2302.09624v2 [cs.CR] UPDATED)
    We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability. The commonly adopted compression schemes introduce information loss into local data while improving communication efficiency, and it remains an open problem whether such discrete-valued mechanisms provide any privacy protection. In this paper, we study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP). More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms, including the binomial noise and the binomial mechanisms that are proposed for privacy preservation, and the sign-based methods that are proposed for data compression, in closed-form expressions. We further investigate the amplification in privacy by sparsification and propose a ternary stochastic compressor. By leveraging compression for privacy amplification, we improve the existing methods by removing the dependency of accuracy (in terms of mean square error) on communication cost in the popular use case of distributed mean estimation, therefore breaking the three-way tradeoff between privacy, communication, and accuracy. Finally, we discuss the Byzantine resilience of the proposed mechanism and its application in federated learning.
    FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning. (arXiv:2308.12305v1 [cs.LG])
    Recently, foundation models have exhibited remarkable advancements in multi-modal learning. These models, equipped with millions (or billions) of parameters, typically require a substantial amount of data for finetuning. However, collecting and centralizing training data from diverse sectors becomes challenging due to distinct privacy regulations. Federated Learning (FL) emerges as a promising solution, enabling multiple clients to collaboratively train neural networks without centralizing their local data. To alleviate client computation burdens and communication overheads, previous works have adapted Parameter-efficient Finetuning (PEFT) methods for FL. Hereby, only a small fraction of the model parameters are optimized and communicated during federated communications. Nevertheless, most previous works have focused on a single modality and neglected one common phenomenon, i.e., the presence of data heterogeneity across the clients. Therefore, in this work, we propose a finetuning framework tailored to heterogeneous multi-modal FL, called Federated Dual-Aadapter Teacher (FedDAT). Specifically, our approach leverages a Dual-Adapter Teacher (DAT) to address data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer. FedDAT is the first approach that enables an efficient distributed finetuning of foundation models for a variety of heterogeneous Vision-Language tasks. To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity, where FedDAT substantially outperforms the existing centralized PEFT methods adapted for FL.
    Short Run Transit Route Planning Decision Support System Using a Deep Learning-Based Weighted Graph. (arXiv:2308.12828v1 [cs.AI])
    Public transport routing plays a crucial role in transit network design, ensuring a satisfactory level of service for passengers. However, current routing solutions rely on traditional operational research heuristics, which can be time-consuming to implement and lack the ability to provide quick solutions. Here, we propose a novel deep learning-based methodology for a decision support system that enables public transport (PT) planners to identify short-term route improvements rapidly. By seamlessly adjusting specific sections of routes between two stops during specific times of the day, our method effectively reduces times and enhances PT services. Leveraging diverse data sources such as GTFS and smart card data, we extract features and model the transportation network as a directed graph. Using self-supervision, we train a deep learning model for predicting lateness values for road segments. These lateness values are then utilized as edge weights in the transportation graph, enabling efficient path searching. Through evaluating the method on Tel Aviv, we are able to reduce times on more than 9\% of the routes. The improved routes included both intraurban and suburban routes showcasing a fact highlighting the model's versatility. The findings emphasize the potential of our data-driven decision support system to enhance public transport and city logistics, promoting greater efficiency and reliability in PT services.
    Efficient Sensor Placement from Regression with Sparse Gaussian Processes in Continuous and Discrete Spaces. (arXiv:2303.00028v4 [cs.RO] UPDATED)
    The sensor placement problem is a common problem that arises when monitoring correlated phenomena, such as temperature and precipitation. Existing approaches to this problem typically use discrete optimization methods, which are computationally expensive and cannot scale to large problems. We address the sensor placement problem in correlated environments by reducing it to a regression problem that can be efficiently solved using sparse Gaussian processes (SGPs). Our approach can handle both discrete sensor placement problems-where sensors are limited to a subset of a given set of locations-and continuous sensor placement problems-where sensors can be placed anywhere in a bounded continuous region. We further generalize our approach to handle sensors with a non-point field of view and integrated observations. Our experimental results on three real-world datasets show that our approach generates sensor placements that result in reconstruction quality that is consistently on par or better than the prior state-of-the-art approach while being significantly faster. Our computationally efficient approach enables both large-scale sensor placement and fast robotic sensor placement for informative path planning algorithms.
    TAI-GAN: Temporally and Anatomically Informed GAN for early-to-late frame conversion in dynamic cardiac PET motion correction. (arXiv:2308.12443v1 [eess.IV])
    The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods to handle the tracer distribution changes to assist existing registration methods. To improve frame-wise registration and parametric quantification, we propose a Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) to transform the early frames into the late reference frame using an all-to-one mapping. Specifically, a feature-wise linear modulation layer encodes channel-wise parameters generated from temporal tracer kinetics information, and rough cardiac segmentations with local shifts serve as the anatomical information. We validated our proposed method on a clinical $^{82}$Rb PET dataset and found that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, motion estimation accuracy and clinical myocardial blood flow (MBF) quantification were improved compared to using the original frames. Our code is published at https://github.com/gxq1998/TAI-GAN.
    BallGAN: 3D-aware Image Synthesis with a Spherical Background. (arXiv:2301.09091v3 [cs.CV] UPDATED)
    3D-aware GANs aim to synthesize realistic 3D scenes such that they can be rendered in arbitrary perspectives to produce images. Although previous methods produce realistic images, they suffer from unstable training or degenerate solutions where the 3D geometry is unnatural. We hypothesize that the 3D geometry is underdetermined due to the insufficient constraint, i.e., being classified as real image to the discriminator is not enough. To solve this problem, we propose to approximate the background as a spherical surface and represent a scene as a union of the foreground placed in the sphere and the thin spherical background. It reduces the degree of freedom in the background field. Accordingly, we modify the volume rendering equation and incorporate dedicated constraints to design a novel 3D-aware GAN framework named BallGAN. BallGAN has multiple advantages as follows. 1) It produces more reasonable 3D geometry; the images of a scene across different viewpoints have better photometric consistency and fidelity than the state-of-the-art methods. 2) The training becomes much more stable. 3) The foreground can be separately rendered on top of different arbitrary backgrounds.
    Min-Max Optimization under Delays. (arXiv:2307.06886v2 [cs.LG] UPDATED)
    Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimization, a topic that has gained recent popularity due to applications in adversarial robustness, game theory, and reinforcement learning. Motivated by this gap, we examine the performance of standard min-max optimization algorithms with delayed gradient updates. First, we show (empirically) that even small delays can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on simple instances for which \texttt{EG} guarantees convergence in the absence of delays. Our empirical study thus suggests the need for a careful analysis of delayed versions of min-max optimization algorithms. Accordingly, under suitable technical assumptions, we prove that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. Our complexity bounds reveal, in a transparent manner, the slow-down in convergence caused by delays.
    NeuralClothSim: Neural Deformation Fields Meet the Kirchhoff-Love Thin Shell Theory. (arXiv:2308.12970v1 [cs.GR])
    Cloth simulation is an extensively studied problem, with a plethora of solutions available in computer graphics literature. Existing cloth simulators produce realistic cloth deformations that obey different types of boundary conditions. Nevertheless, their operational principle remains limited in several ways: They operate on explicit surface representations with a fixed spatial resolution, perform a series of discretised updates (which bounds their temporal resolution), and require comparably large amounts of storage. Moreover, back-propagating gradients through the existing solvers is often not straightforward, which poses additional challenges when integrating them into modern neural architectures. In response to the limitations mentioned above, this paper takes a fundamentally different perspective on physically-plausible cloth simulation and re-thinks this long-standing problem: We propose NeuralClothSim, i.e., a new cloth simulation approach using thin shells, in which surface evolution is encoded in neural network weights. Our memory-efficient and differentiable solver operates on a new continuous coordinate-based representation of dynamic surfaces, i.e., neural deformation fields (NDFs); it supervises NDF evolution with the rules of the non-linear Kirchhoff-Love shell theory. NDFs are adaptive in the sense that they 1) allocate their capacity to the deformation details as the latter arise during the cloth evolution and 2) allow surface state queries at arbitrary spatial and temporal resolutions without retraining. We show how to train our NeuralClothSim solver while imposing hard boundary conditions and demonstrate multiple applications, such as material interpolation and simulation editing. The experimental results highlight the effectiveness of our formulation and its potential impact.
    False Information, Bots and Malicious Campaigns: Demystifying Elements of Social Media Manipulations. (arXiv:2308.12497v1 [cs.SI])
    The rapid spread of false information and persistent manipulation attacks on online social networks (OSNs), often for political, ideological, or financial gain, has affected the openness of OSNs. While researchers from various disciplines have investigated different manipulation-triggering elements of OSNs (such as understanding information diffusion on OSNs or detecting automated behavior of accounts), these works have not been consolidated to present a comprehensive overview of the interconnections among these elements. Notably, user psychology, the prevalence of bots, and their tactics in relation to false information detection have been overlooked in previous research. To address this research gap, this paper synthesizes insights from various disciplines to provide a comprehensive analysis of the manipulation landscape. By integrating the primary elements of social media manipulation (SMM), including false information, bots, and malicious campaigns, we extensively examine each SMM element. Through a systematic investigation of prior research, we identify commonalities, highlight existing gaps, and extract valuable insights in the field. Our findings underscore the urgent need for interdisciplinary research to effectively combat social media manipulations, and our systematization can guide future research efforts and assist OSN providers in ensuring the safety and integrity of their platforms.
    Continuous Reinforcement Learning-based Dynamic Difficulty Adjustment in a Visual Working Memory Game. (arXiv:2308.12726v1 [cs.HC])
    Dynamic Difficulty Adjustment (DDA) is a viable approach to enhance a player's experience in video games. Recently, Reinforcement Learning (RL) methods have been employed for DDA in non-competitive games; nevertheless, they rely solely on discrete state-action space with a small search space. In this paper, we propose a continuous RL-based DDA methodology for a visual working memory (VWM) game to handle the complex search space for the difficulty of memorization. The proposed RL-based DDA tailors game difficulty based on the player's score and game difficulty in the last trial. We defined a continuous metric for the difficulty of memorization. Then, we consider the task difficulty and the vector of difficulty-score as the RL's action and state, respectively. We evaluated the proposed method through a within-subject experiment involving 52 subjects. The proposed approach was compared with two rule-based difficulty adjustment methods in terms of player's score and game experience measured by a questionnaire. The proposed RL-based approach resulted in a significantly better game experience in terms of competence, tension, and negative and positive affect. Players also achieved higher scores and win rates. Furthermore, the proposed RL-based DDA led to a significantly less decline in the score in a 20-trial session.
    Machine learning in parameter estimation of nonlinear systems. (arXiv:2308.12393v1 [cs.LG])
    Accurately estimating parameters in complex nonlinear systems is crucial across scientific and engineering fields. We present a novel approach for parameter estimation using a neural network with the Huber loss function. This method taps into deep learning's abilities to uncover parameters governing intricate behaviors in nonlinear equations. We validate our approach using synthetic data and predefined functions that model system dynamics. By training the neural network with noisy time series data, it fine-tunes the Huber loss function to converge to accurate parameters. We apply our method to damped oscillators, Van der Pol oscillators, Lotka-Volterra systems, and Lorenz systems under multiplicative noise. The trained neural network accurately estimates parameters, evident from closely matching latent dynamics. Comparing true and estimated trajectories visually reinforces our method's precision and robustness. Our study underscores the Huber loss-guided neural network as a versatile tool for parameter estimation, effectively uncovering complex relationships in nonlinear systems. The method navigates noise and uncertainty adeptly, showcasing its adaptability to real-world challenges.
    Exploiting Time-Frequency Conformers for Music Audio Enhancement. (arXiv:2308.12599v1 [cs.SD])
    With the proliferation of video platforms on the internet, recording musical performances by mobile devices has become commonplace. However, these recordings often suffer from degradation such as noise and reverberation, which negatively impact the listening experience. Consequently, the necessity for music audio enhancement (referred to as music enhancement from this point onward), involving the transformation of degraded audio recordings into pristine high-quality music, has surged to augment the auditory experience. To address this issue, we propose a music enhancement system based on the Conformer architecture that has demonstrated outstanding performance in speech enhancement tasks. Our approach explores the attention mechanisms of the Conformer and examines their performance to discover the best approach for the music enhancement task. Our experimental results show that our proposed model achieves state-of-the-art performance on single-stem music enhancement. Furthermore, our system can perform general music enhancement with multi-track mixtures, which has not been examined in previous work.
    LORD: Leveraging Open-Set Recognition with Unknown Data. (arXiv:2308.12584v1 [cs.CV])
    Handling entirely unknown data is a challenge for any deployed classifier. Classification models are typically trained on a static pre-defined dataset and are kept in the dark for the open unassigned feature space. As a result, they struggle to deal with out-of-distribution data during inference. Addressing this task on the class-level is termed open-set recognition (OSR). However, most OSR methods are inherently limited, as they train closed-set classifiers and only adapt the downstream predictions to OSR. This work presents LORD, a framework to Leverage Open-set Recognition by exploiting unknown Data. LORD explicitly models open space during classifier training and provides a systematic evaluation for such approaches. We identify three model-agnostic training strategies that exploit background data and applied them to well-established classifiers. Due to LORD's extensive evaluation protocol, we consistently demonstrate improved recognition of unknown data. The benchmarks facilitate in-depth analysis across various requirement levels. To mitigate dependency on extensive and costly background datasets, we explore mixup as an off-the-shelf data generation technique. Our experiments highlight mixup's effectiveness as a substitute for background datasets. Lightweight constraints on mixup synthesis further improve OSR performance.
    The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings. (arXiv:2308.12646v1 [cs.HC])
    This paper reports on the GENEA Challenge 2023, in which participating teams built speech-driven gesture-generation systems using the same speech and motion dataset, followed by a joint evaluation. This year's challenge provided data on both sides of a dyadic interaction, allowing teams to generate full-body motion for an agent given its speech (text and audio) and the speech and motion of the interlocutor. We evaluated 12 submissions and 2 baselines together with held-out motion-capture data in several large-scale user studies. The studies focused on three aspects: 1) the human-likeness of the motion, 2) the appropriateness of the motion for the agent's own speech whilst controlling for the human-likeness of the motion, and 3) the appropriateness of the motion for the behaviour of the interlocutor in the interaction, using a setup that controls for both the human-likeness of the motion and the agent's own speech. We found a large span in human-likeness between challenge submissions, with a few systems rated close to human mocap. Appropriateness seems far from being solved, with most submissions performing in a narrow range slightly above chance, far behind natural motion. The effect of the interlocutor is even more subtle, with submitted systems at best performing barely above chance. Interestingly, a dyadic system being highly appropriate for agent speech does not necessarily imply high appropriateness for the interlocutor. Additional material is available via the project website at https://svito-zar.github.io/GENEAchallenge2023/ .
    IP-UNet: Intensity Projection UNet Architecture for 3D Medical Volume Segmentation. (arXiv:2308.12761v1 [eess.IV])
    CNNs have been widely applied for medical image analysis. However, limited memory capacity is one of the most common drawbacks of processing high-resolution 3D volumetric data. 3D volumes are usually cropped or downsized first before processing, which can result in a loss of resolution, increase class imbalance, and affect the performance of the segmentation algorithms. In this paper, we propose an end-to-end deep learning approach called IP-UNet. IP-UNet is a UNet-based model that performs multi-class segmentation on Intensity Projection (IP) of 3D volumetric data instead of the memory-consuming 3D volumes. IP-UNet uses limited memory capability for training without losing the original 3D image resolution. We compare the performance of three models in terms of segmentation accuracy and computational cost: 1) Slice-by-slice 2D segmentation of the CT scan images using a conventional 2D UNet model. 2) IP-UNet that operates on data obtained by merging the extracted Maximum Intensity Projection (MIP), Closest Vessel Projection (CVP), and Average Intensity Projection (AvgIP) representations of the source 3D volumes, then applying the UNet model on the output IP images. 3) 3D-UNet model directly reads the 3D volumes constructed from a series of CT scan images and outputs the 3D volume of the predicted segmentation. We test the performance of these methods on 3D volumetric images for automatic breast calcification detection. Experimental results show that IP-Unet can achieve similar segmentation accuracy with 3D-Unet but with much better performance. It reduces the training time by 70\% and memory consumption by 92\%.
    An Efficient Data Analysis Method for Big Data using Multiple-Model Linear Regression. (arXiv:2308.12691v1 [cs.LG])
    This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR), which separates input datasets into subsets and construct local linear regression models of them. The proposed data analysis method is shown to be more efficient and flexible than other regression based methods. This paper also proposes an approximate algorithm to construct MMLR models based on $(\epsilon,\delta)$-estimator, and gives mathematical proofs of the correctness and efficiency of MMLR algorithm, of which the time complexity is linear with respect to the size of input datasets. This paper also empirically implements the method on both synthetic and real-world datasets, the algorithm shows to have comparable performance to existing regression methods in many cases, while it takes almost the shortest time to provide a high prediction accuracy.
    On the Consistency of Average Embeddings for Item Recommendation. (arXiv:2308.12767v1 [cs.IR])
    A prevalent practice in recommender systems consists of averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.
    FIESTA: Autoencoders for accurate fiber segmentation in tractography. (arXiv:2212.00143v3 [cs.CV] UPDATED)
    White matter bundle segmentation is a cornerstone of modern tractography to study the brain's structural connectivity in domains such as neurological disorders, neurosurgery, and aging. In this study, we present FIESTA (FIbEr Segmentation in Tractography using Autoencoders), a reliable and robust, fully automated, and easily semi-automatically calibrated pipeline based on deep autoencoders that can dissect and fully populate white matter bundles. This pipeline is built upon previous works that demonstrated how autoencoders can be used successfully for streamline filtering, bundle segmentation, and streamline generation in tractography. Our proposed method improves bundle segmentation coverage by recovering hard-to-track bundles with generative sampling through the latent space seeding of the subject bundle and the atlas bundle. A latent space of streamlines is learned using autoencoder-based modeling combined with contrastive learning. Using an atlas of bundles in standard space (MNI), our proposed method segments new tractograms using the autoencoder latent distance between each tractogram streamline and its closest neighbor bundle in the atlas of bundles. Intra-subject bundle reliability is improved by recovering hard-to-track streamlines, using the autoencoder to generate new streamlines that increase the spatial coverage of each bundle while remaining anatomically correct. Results show that our method is more reliable than state-of-the-art automated virtual dissection methods such as RecoBundles, RecoBundlesX, TractSeg, White Matter Analysis and XTRACT. Our framework allows for the transition from one anatomical bundle definition to another with marginal calibration efforts. Overall, these results show that our framework improves the practicality and usability of current state-of-the-art bundle segmentation framework.
    Augmenting medical image classifiers with synthetic data from latent diffusion models. (arXiv:2308.12453v1 [cs.CV])
    While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented populations. Some have proposed that generative AI could reduce the need for real data, but its utility in model development remains unclear. Skin disease serves as a useful case study in synthetic image generation due to the diversity of disease appearance, particularly across the protected attribute of skin tone. Here we show that latent diffusion models can scalably generate images of skin disease and that augmenting model training with these data improves performance in data-limited settings. These performance gains saturate at synthetic-to-real image ratios above 10:1 and are substantially smaller than the gains obtained from adding real images. As part of our analysis, we generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies. Our results suggest that synthetic data could serve as a force-multiplier for model development, but the collection of diverse real-world data remains the most important step to improve medical AI algorithms.
    Fat Shattering, Joint Measurability, and PAC Learnability of POVM Hypothesis Classes. (arXiv:2308.12304v1 [stat.ML])
    We characterize learnability for quantum measurement classes by establishing matching necessary and sufficient conditions for their PAC learnability, along with corresponding sample complexity bounds, in the setting where the learner is given access only to prepared quantum states. We first probe the results from previous works on this setting. We show that the empirical risk defined in previous works and matching the definition in the classical theory fails to satisfy the uniform convergence property enjoyed in the classical setting for some learnable classes. Moreover, we show that VC dimension generalization upper bounds in previous work are frequently infinite, even for finite-dimensional POVM classes. To surmount the failure of the standard ERM to satisfy uniform convergence, we define a new learning rule -- denoised ERM. We show this to be a universal learning rule for POVM and probabilistically observed concept classes, and the condition for it to satisfy uniform convergence is finite fat shattering dimension of the class. We give quantitative sample complexity upper and lower bounds for learnability in terms of finite fat-shattering dimension and a notion of approximate finite partitionability into approximately jointly measurable subsets, which allow for sample reuse. We then show that finite fat shattering dimension implies finite coverability by approximately jointly measurable subsets, leading to our matching conditions. We also show that every measurement class defined on a finite-dimensional Hilbert space is PAC learnable. We illustrate our results on several example POVM classes.
    Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy. (arXiv:2308.12553v1 [cs.LG])
    Common explanations for shortcut learning assume that the shortcut improves prediction under the training distribution but not in the test distribution. Thus, models trained via the typical gradient-based optimization of cross-entropy, which we call default-ERM, utilize the shortcut. However, even when the stable feature determines the label in the training distribution and the shortcut does not provide any additional information, like in perception tasks, default-ERM still exhibits shortcut learning. Why are such solutions preferred when the loss for default-ERM can be driven to zero using the stable feature alone? By studying a linear perception task, we show that default-ERM's preference for maximizing the margin leads to models that depend more on the shortcut than the stable feature, even without overparameterization. This insight suggests that default-ERM's implicit inductive bias towards max-margin is unsuitable for perception tasks. Instead, we develop an inductive bias toward uniform margins and show that this bias guarantees dependence only on the perfect stable feature in the linear perception task. We develop loss functions that encourage uniform-margin solutions, called margin control (MARG-CTRL). MARG-CTRL mitigates shortcut learning on a variety of vision and language tasks, showing that better inductive biases can remove the need for expensive two-stage shortcut-mitigating methods in perception tasks.
    PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts. (arXiv:2306.04528v3 [cs.CL] UPDATED)
    The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. These prompts are then employed in diverse tasks, such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,032 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our findings demonstrate that contemporary LLMs are vulnerable to adversarial prompts. Furthermore, we present comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users. We make our code, prompts, and methodologies to generate adversarial prompts publicly accessible, thereby enabling and encouraging collaborative exploration in this pivotal field: https://github.com/microsoft/promptbench.
    Towards Hierarchical Regional Transformer-based Multiple Instance Learning. (arXiv:2308.12634v1 [cs.CV])
    The classification of gigapixel histopathology images with deep multiple instance learning models has become a critical task in digital pathology and precision medicine. In this work, we propose a Transformer-based multiple instance learning approach that replaces the traditional learned attention mechanism with a regional, Vision Transformer inspired self-attention mechanism. We present a method that fuses regional patch information to derive slide-level predictions and show how this regional aggregation can be stacked to hierarchically process features on different distance levels. To increase predictive accuracy, especially for datasets with small, local morphological features, we introduce a method to focus the image processing on high attention regions during inference. Our approach is able to significantly improve performance over the baseline on two histopathology datasets and points towards promising directions for further research.
    Symplectic model reduction of Hamiltonian systems using data-driven quadratic manifolds. (arXiv:2305.15490v2 [math.NA] UPDATED)
    This work presents two novel approaches for the symplectic model reduction of high-dimensional Hamiltonian systems using data-driven quadratic manifolds. Classical symplectic model reduction approaches employ linear symplectic subspaces for representing the high-dimensional system states in a reduced-dimensional coordinate system. While these approximations respect the symplectic nature of Hamiltonian systems, linear basis approximations can suffer from slowly decaying Kolmogorov $N$-width, especially in wave-type problems, which then requires a large basis size. We propose two different model reduction methods based on recently developed quadratic manifolds, each presenting its own advantages and limitations. The addition of quadratic terms to the state approximation, which sits at the heart of the proposed methodologies, enables us to better represent intrinsic low-dimensionality in the problem at hand. Both approaches are effective for issuing predictions in settings well outside the range of their training data while providing more accurate solutions than the linear symplectic reduced-order models.
    Deploying Deep Reinforcement Learning Systems: A Taxonomy of Challenges. (arXiv:2308.12438v1 [cs.LG])
    Deep reinforcement learning (DRL), leveraging Deep Learning (DL) in reinforcement learning, has shown significant potential in achieving human-level autonomy in a wide range of domains, including robotics, computer vision, and computer games. This potential justifies the enthusiasm and growing interest in DRL in both academia and industry. However, the community currently focuses mostly on the development phase of DRL systems, with little attention devoted to DRL deployment. In this paper, we propose an empirical study on Stack Overflow (SO), the most popular Q&A forum for developers, to uncover and understand the challenges practitioners faced when deploying DRL systems. Specifically, we categorized relevant SO posts by deployment platforms: server/cloud, mobile/embedded system, browser, and game engine. After filtering and manual analysis, we examined 357 SO posts about DRL deployment, investigated the current state, and identified the challenges related to deploying DRL systems. Then, we investigate the prevalence and difficulty of these challenges. Results show that the general interest in DRL deployment is growing, confirming the study's relevance and importance. Results also show that DRL deployment is more difficult than other DRL issues. Additionally, we built a taxonomy of 31 unique challenges in deploying DRL to different platforms. On all platforms, RL environment-related challenges are the most popular, and communication-related challenges are the most difficult among practitioners. We hope our study inspires future research and helps the community overcome the most common and difficult challenges practitioners face when deploying DRL systems.
    Inferring gender from name: a large scale performance evaluation study. (arXiv:2308.12381v1 [cs.CL])
    A person's gender is a crucial piece of information when performing research across a wide range of scientific disciplines, such as medicine, sociology, political science, and economics, to name a few. However, in increasing instances, especially given the proliferation of big data, gender information is not readily available. In such cases researchers need to infer gender from readily available information, primarily from persons' names. While inferring gender from name may raise some ethical questions, the lack of viable alternatives means that researchers have to resort to such approaches when the goal justifies the means - in the majority of such studies the goal is to examine patterns and determinants of gender disparities. The necessity of name-to-gender inference has generated an ever-growing domain of algorithmic approaches and software products. These approaches have been used throughout the world in academia, industry, governmental and non-governmental organizations. Nevertheless, the existing approaches have yet to be systematically evaluated and compared, making it challenging to determine the optimal approach for future research. In this work, we conducted a large scale performance evaluation of existing approaches for name-to-gender inference. Analysis are performed using a variety of large annotated datasets of names. We further propose two new hybrid approaches that achieve better performance than any single existing approach.
    A multiobjective continuation method to compute the regularization path of deep neural networks. (arXiv:2308.12044v2 [cs.LG] UPDATED)
    Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. In machine learning approaches based on linear models, it is well known that there exists a connecting path between the sparsest solution in terms of the $\ell^1$ norm (i.e., zero weights) and the non-regularized solution, which is called the regularization path. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem. However, due to the non-smoothness of the $\ell^1$ norm and the high number of parameters, this approach is not very efficient from a computational perspective. To overcome this limitation, we present an algorithm that allows for the approximation of the entire Pareto front for the above-mentioned objectives in a very efficient manner. We present numerical examples using both deterministic and stochastic gradients. We furthermore demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.
    Match-And-Deform: Time Series Domain Adaptation through Optimal Transport and Temporal Alignment. (arXiv:2308.12686v1 [cs.LG])
    While large volumes of unlabeled data are usually available, associated labels are often scarce. The unsupervised domain adaptation problem aims at exploiting labels from a source domain to classify data from a related, yet different, target domain. When time series are at stake, new difficulties arise as temporal shifts may appear in addition to the standard feature distribution shift. In this paper, we introduce the Match-And-Deform (MAD) approach that aims at finding correspondences between the source and target time series while allowing temporal distortions. The associated optimization problem simultaneously aligns the series thanks to an optimal transport loss and the time stamps through dynamic time warping. When embedded into a deep neural network, MAD helps learning new representations of time series that both align the domains and maximize the discriminative power of the network. Empirical studies on benchmark datasets and remote sensing data demonstrate that MAD makes meaningful sample-to-sample pairing and time shift estimation, reaching similar or better classification performance than state-of-the-art deep time series domain adaptation strategies.
    Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation. (arXiv:2308.12968v1 [cs.CV])
    Automatic high-quality rendering of anime scenes from complex real-world images is of significant practical value. The challenges of this task lie in the complexity of the scenes, the unique features of anime style, and the lack of high-quality datasets to bridge the domain gap. Despite promising attempts, previous efforts are still incompetent in achieving satisfactory results with consistent semantic preservation, evident stylization, and fine details. In this study, we propose Scenimefy, a novel semi-supervised image-to-image translation framework that addresses these challenges. Our approach guides the learning with structure-consistent pseudo paired data, simplifying the pure unsupervised setting. The pseudo data are derived uniquely from a semantic-constrained StyleGAN leveraging rich model priors like CLIP. We further apply segmentation-guided data selection to obtain high-quality pseudo supervision. A patch-wise contrastive style loss is introduced to improve stylization and fine details. Besides, we contribute a high-resolution anime scene dataset to facilitate future research. Our extensive experiments demonstrate the superiority of our method over state-of-the-art baselines in terms of both perceptual quality and quantitative performance.
    Exact Manifold Gaussian Variational Bayes. (arXiv:2210.14598v3 [stat.ML] UPDATED)
    We propose an optimization algorithm for Variational Inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for Gaussian Variational Inference that implicitly satisfies the positive definite constraint on the variational covariance matrix. Our Exact manifold Gaussian Variational Bayes (EMGVB) provides exact but simple update rules and is straightforward to implement. Due to its black-box nature, EMGVB stands as a ready-to-use solution for VI in complex models. Over five datasets, we empirically validate our feasible approach on different statistical, econometric, and deep learning models, discussing its performance with respect to baseline methods.
    Low-count Time Series Anomaly Detection. (arXiv:2308.12925v1 [cs.LG])
    Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types. Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios (when anomaly signatures are provably undetectable), and non-uniform performance (when average metrics are not representative of local behaviour). The time series anomaly detection community currently lacks explicit tooling and processes to model and reliably detect anomalies in these settings. We address this gap by introducing a novel generative procedure for creating benchmark datasets comprising of low-count time series with anomalous segments. Via a mixture of theoretical and empirical analysis, our work explains how widely-used algorithms struggle with the distribution overlap between normal and anomalous segments. In order to mitigate this shortcoming, we then leverage our findings to demonstrate how anomaly score smoothing consistently improves performance. The practical utility of our analysis and recommendation is validated on a real-world dataset containing sales data for retail stores.
    Synthesize High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model. (arXiv:2304.02169v2 [cs.LG] UPDATED)
    Synthetic electronic health records (EHRs) that are both realistic and preserve privacy can serve as an alternative to real EHRs for machine learning (ML) modeling and statistical analysis. However, generating high-fidelity and granular electronic health record (EHR) data in its original, highly-dimensional form poses challenges for existing methods due to the complexities inherent in high-dimensional data. In this paper, we propose Hierarchical Autoregressive Language mOdel (HALO) for generating longitudinal high-dimensional EHR, which preserve the statistical properties of real EHR and can be used to train accurate ML models without privacy concerns. Our HALO method, designed as a hierarchical autoregressive model, generates a probability density function of medical codes, clinical visits, and patient records, allowing for the generation of realistic EHR data in its original, unaggregated form without the need for variable selection or aggregation. Additionally, our model also produces high-quality continuous variables in a longitudinal and probabilistic manner. We conducted extensive experiments and demonstrate that HALO can generate high-fidelity EHR data with high-dimensional disease code probabilities (d > 10,000), disease co-occurrence probabilities within visits (d > 1,000,000), and conditional probabilities across consecutive visits (d > 5,000,000) and achieve above 0.9 R2 correlation in comparison to real EHR data. This performance then enables downstream ML models trained on its synthetic data to achieve comparable accuracy to models trained on real data (0.938 AUROC with HALO data vs. 0.943 with real data). Finally, using a combination of real and synthetic data enhances the accuracy of ML models beyond that achieved by using only real EHR data.
    SieveNet: Selecting Point-Based Features for Mesh Networks. (arXiv:2308.12530v1 [cs.CV])
    Meshes are widely used in 3D computer vision and graphics, but their irregular topology poses challenges in applying them to existing neural network architectures. Recent advances in mesh neural networks turn to remeshing and push the boundary of pioneer methods that solely take the raw meshes as input. Although the remeshing offers a regular topology that significantly facilitates the design of mesh network architectures, features extracted from such remeshed proxies may struggle to retain the underlying geometry faithfully, limiting the subsequent neural network's capacity. To address this issue, we propose SieveNet, a novel paradigm that takes into account both the regular topology and the exact geometry. Specifically, this method utilizes structured mesh topology from remeshing and accurate geometric information from distortion-aware point sampling on the surface of the original mesh. Furthermore, our method eliminates the need for hand-crafted feature engineering and can leverage off-the-shelf network architectures such as the vision transformer. Comprehensive experimental results on classification and segmentation tasks well demonstrate the effectiveness and superiority of our method.
    Improving Generative Model-based Unfolding with Schr\"{o}dinger Bridges. (arXiv:2308.12351v1 [hep-ph])
    Machine learning-based unfolding has enabled unbinned and high-dimensional differential cross section measurements. Two main approaches have emerged in this research area: one based on discriminative models and one based on generative models. The main advantage of discriminative models is that they learn a small correction to a starting simulation while generative models scale better to regions of phase space with little data. We propose to use Schroedinger Bridges and diffusion models to create SBUnfold, an unfolding approach that combines the strengths of both discriminative and generative models. The key feature of SBUnfold is that its generative model maps one set of events into another without having to go through a known probability density as is the case for normalizing flows and standard diffusion models. We show that SBUnfold achieves excellent performance compared to state of the art methods on a synthetic Z+jets dataset.
    Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers. (arXiv:2306.04504v3 [cs.CL] UPDATED)
    ChatGPT is a large language model developed by OpenAI. Despite its impressive performance across various tasks, no prior work has investigated its capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of ChatGPT on various benchmark biomedical tasks, such as relation extraction, document classification, question answering, and summarization. To the best of our knowledge, this is the first work that conducts an extensive evaluation of ChatGPT in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot ChatGPT even outperforms the state-of-the-art fine-tuned generative transformer models, such as BioGPT and BioBART. This suggests that ChatGPT's pre-training on large text corpora makes it quite specialized even in the biomedical domain. Our findings demonstrate that ChatGPT has the potential to be a valuable tool for various tasks in the biomedical domain that lack large annotated data.
    Masked Autoencoders are Efficient Class Incremental Learners. (arXiv:2308.12510v1 [cs.CV])
    Class Incremental Learning (CIL) aims to sequentially learn new classes while avoiding catastrophic forgetting of previous knowledge. We propose to use Masked Autoencoders (MAEs) as efficient learners for CIL. MAEs were originally designed to learn useful representations through reconstructive unsupervised learning, and they can be easily integrated with a supervised loss for classification. Moreover, MAEs can reliably reconstruct original input images from randomly selected patches, which we use to store exemplars from past tasks more efficiently for CIL. We also propose a bilateral MAE framework to learn from image-level and embedding-level fusion, which produces better-quality reconstructed images and more stable representations. Our experiments confirm that our approach performs better than the state-of-the-art on CIFAR-100, ImageNet-Subset, and ImageNet-Full. The code is available at https://github.com/scok30/MAE-CIL .
    Zero-delay Consistent Signal Reconstruction from Streamed Multivariate Time Series. (arXiv:2308.12459v1 [eess.SP])
    Digitalizing real-world analog signals typically involves sampling in time and discretizing in amplitude. Subsequent signal reconstructions inevitably incur an error that depends on the amplitude resolution and the temporal density of the acquired samples. From an implementation viewpoint, consistent signal reconstruction methods have proven a profitable error-rate decay as the sampling rate increases. Despite that, these results are obtained under offline settings. Therefore, a research gap exists regarding methods for consistent signal reconstruction from data streams. This paper presents a method that consistently reconstructs streamed multivariate time series of quantization intervals under a zero-delay response requirement. On the other hand, previous work has shown that the temporal dependencies within univariate time series can be exploited to reduce the roughness of zero-delay signal reconstructions. This work shows that the spatiotemporal dependencies within multivariate time series can also be exploited to achieve improved results. Specifically, the spatiotemporal dependencies of the multivariate time series are learned, with the assistance of a recurrent neural network, to reduce the roughness of the signal reconstruction on average while ensuring consistency. Our experiments show that our proposed method achieves a favorable error-rate decay with the sampling rate compared to a similar but non-consistent reconstruction.
    A Huber Loss Minimization Approach to Byzantine Robust Federated Learning. (arXiv:2308.12581v1 [cs.LG])
    Federated learning systems are susceptible to adversarial attacks. To combat this, we introduce a novel aggregator based on Huber loss minimization, and provide a comprehensive theoretical analysis. Under independent and identically distributed (i.i.d) assumption, our approach has several advantages compared to existing methods. Firstly, it has optimal dependence on $\epsilon$, which stands for the ratio of attacked clients. Secondly, our approach does not need precise knowledge of $\epsilon$. Thirdly, it allows different clients to have unequal data sizes. We then broaden our analysis to include non-i.i.d data, such that clients have slightly different distributions.
    Label Budget Allocation in Multi-Task Learning. (arXiv:2308.12949v1 [cs.LG])
    The cost of labeling data often limits the performance of machine learning systems. In multi-task learning, related tasks provide information to each other and improve overall performance, but the label cost can vary among tasks. How should the label budget (i.e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance? We are the first to propose and formally define the label budget allocation problem in multi-task learning and to empirically show that different budget allocation strategies make a big difference to its performance. We propose a Task-Adaptive Budget Allocation algorithm to robustly generate the optimal budget allocation adaptive to different multi-task learning settings. Specifically, we estimate and then maximize the extent of new information obtained from the allocated budget as a proxy for multi-task learning performance. Experiments on PASCAL VOC and Taskonomy demonstrate the efficacy of our approach over other widely used heuristic labeling strategies.
    Solving Forward and Inverse Problems of Contact Mechanics using Physics-Informed Neural Networks. (arXiv:2308.12716v1 [math.NA])
    This paper explores the ability of physics-informed neural networks (PINNs) to solve forward and inverse problems of contact mechanics for small deformation elasticity. We deploy PINNs in a mixed-variable formulation enhanced by output transformation to enforce Dirichlet and Neumann boundary conditions as hard constraints. Inequality constraints of contact problems, namely Karush-Kuhn-Tucker (KKT) type conditions, are enforced as soft constraints by incorporating them into the loss function during network training. To formulate the loss function contribution of KKT constraints, existing approaches applied to elastoplasticity problems are investigated and we explore a nonlinear complementarity problem (NCP) function, namely Fischer-Burmeister, which possesses advantageous characteristics in terms of optimization. Based on the Hertzian contact problem, we show that PINNs can serve as pure partial differential equation (PDE) solver, as data-enhanced forward model, as inverse solver for parameter identification, and as fast-to-evaluate surrogate model. Furthermore, we demonstrate the importance of choosing proper hyperparameters, e.g. loss weights, and a combination of Adam and L-BFGS-B optimizers aiming for better results in terms of accuracy and training time.
    Out of the Box Thinking: Improving Customer Lifetime Value Modelling via Expert Routing and Game Whale Detection. (arXiv:2308.12729v1 [cs.IR])
    Customer lifetime value (LTV) prediction is essential for mobile game publishers trying to optimize the advertising investment for each user acquisition based on the estimated worth. In mobile games, deploying microtransactions is a simple yet effective monetization strategy, which attracts a tiny group of game whales who splurge on in-game purchases. The presence of such game whales may impede the practicality of existing LTV prediction models, since game whales' purchase behaviours always exhibit varied distribution from general users. Consequently, identifying game whales can open up new opportunities to improve the accuracy of LTV prediction models. However, little attention has been paid to applying game whale detection in LTV prediction, and existing works are mainly specialized for the long-term LTV prediction with the assumption that the high-quality user features are available, which is not applicable in the UA stage. In this paper, we propose ExpLTV, a novel multi-task framework to perform LTV prediction and game whale detection in a unified way. In ExpLTV, we first innovatively design a deep neural network-based game whale detector that can not only infer the intrinsic order in accordance with monetary value, but also precisely identify high spenders (i.e., game whales) and low spenders. Then, by treating the game whale detector as a gating network to decide the different mixture patterns of LTV experts assembling, we can thoroughly leverage the shared information and scenario-specific information (i.e., game whales modelling and low spenders modelling). Finally, instead of separately designing a purchase rate estimator for two tasks, we design a shared estimator that can preserve the inner task relationships. The superiority of ExpLTV is further validated via extensive experiments on three industrial datasets.
    Fall Detection using Knowledge Distillation Based Long short-term memory for Offline Embedded and Low Power Devices. (arXiv:2308.12481v1 [eess.SP])
    This paper presents a cost-effective, low-power approach to unintentional fall detection using knowledge distillation-based LSTM (Long Short-Term Memory) models to significantly improve accuracy. With a primary focus on analyzing time-series data collected from various sensors, the solution offers real-time detection capabilities, ensuring prompt and reliable identification of falls. The authors investigate fall detection models that are based on different sensors, comparing their accuracy rates and performance. Furthermore, they employ the technique of knowledge distillation to enhance the models' precision, resulting in refined accurate configurations that consume lower power. As a result, this proposed solution presents a compelling avenue for the development of energy-efficient fall detection systems for future advancements in this critical domain.
    Efficient Adaptive Activation Rounding for Post-Training Quantization. (arXiv:2208.11945v3 [cs.LG] UPDATED)
    Post-training quantization attracts increasing attention due to its convenience in deploying quantized neural networks. Although rounding-to-nearest remains the prevailing method for DNN quantization, prior research has demonstrated its suboptimal nature when applied to weight quantization. They propose optimizing weight rounding schemes by leveraging output error rather than the traditional weight quantization error. Our study reveals that similar rounding challenges also extend to activation quantization. Despite the easy generalization, the challenges lie in the dynamic nature of activation. Adaptive rounding is expected for varying activations and the method is subjected to runtime overhead. To tackle this, we propose the AQuant quantization framework with a novel perspective to reduce output error by adjusting rounding schemes of activations. Instead of using the constant rounding border 0.5 of the rounding-to-nearest operation, we make the border become a function w.r.t. the activation value to change the activation rounding by the adaptive border. To deal with the runtime overhead, we use a coarse-grained version of the border function. Finally, we introduce our framework to optimize the border function. Extensive experiments show that AQuant achieves notable improvements compared to state-of-the-art works and pushes the accuracy of ResNet-18 up to 60.31% under the 2-bit weight and activation quantization.
    Tackling Face Verification Edge Cases: In-Depth Analysis and Human-Machine Fusion Approach. (arXiv:2304.08134v4 [cs.CV] UPDATED)
    Nowadays, face recognition systems surpass human performance on several datasets. However, there are still edge cases that the machine can't correctly classify. This paper investigates the effect of a combination of machine and human operators in the face verification task. First, we look closer at the edge cases for several state-of-the-art models to discover common datasets' challenging settings. Then, we conduct a study with 60 participants on these selected tasks with humans and provide an extensive analysis. Finally, we demonstrate that combining machine and human decisions can further improve the performance of state-of-the-art face verification systems on various benchmark datasets. Code and data are publicly available on GitHub.
    HypBO: Expert-Guided Chemist-in-the-Loop Bayesian Search for New Materials. (arXiv:2308.11787v2 [cs.LG] UPDATED)
    Robotics and automation offer massive accelerations for solving intractable, multivariate scientific problems such as materials discovery, but the available search spaces can be dauntingly large. Bayesian optimization (BO) has emerged as a popular sample-efficient optimization engine, thriving in tasks where no analytic form of the target function/property is known. Here we exploit expert human knowledge in the form of hypotheses to direct Bayesian searches more quickly to promising regions of chemical space. Previous methods have used underlying distributions derived from existing experimental measurements, which is unfeasible for new, unexplored scientific tasks. Also, such distributions cannot capture intricate hypotheses. Our proposed method, which we call HypBO, uses expert human hypotheses to generate an improved seed of samples. Unpromising seeds are automatically discounted, while promising seeds are used to augment the surrogate model data, thus achieving better-informed sampling. This process continues in a global versus local search fashion, organized in a bilevel optimization framework. We validate the performance of our method on a range of synthetic functions and demonstrate its practical utility on a real chemical design task where the use of expert hypotheses accelerates the search performance significantly.
    CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias. (arXiv:2308.12539v1 [cs.CL])
    As language models (LMs) become increasingly powerful, it is important to quantify and compare them for sociodemographic bias with potential for harm. Prior bias measurement datasets are sensitive to perturbations in their manually designed templates, therefore unreliable. To achieve reliability, we introduce the Comprehensive Assessment of Language Model bias (CALM), a benchmark dataset to quantify bias in LMs across three tasks. We integrate 16 existing datasets across different domains, such as Wikipedia and news articles, to filter 224 templates from which we construct a dataset of 78,400 examples. We compare the diversity of CALM with prior datasets on metrics such as average semantic similarity, and variation in template length, and test the sensitivity to small perturbations. We show that our dataset is more diverse and reliable than previous datasets, thus better capture the breadth of linguistic variation required to reliably evaluate model bias. We evaluate 20 large language models including six prominent families of LMs such as Llama-2. In two LM series, OPT and Bloom, we found that larger parameter models are more biased than lower parameter models. We found the T0 series of models to be the least biased. Furthermore, we noticed a tradeoff between gender and racial bias with increasing model size in some model series. The code is available at https://github.com/vipulgupta1011/CALM.
    BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection. (arXiv:2308.12439v1 [cs.CR])
    We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs. Our defense falls within the category of post-development defenses that operate independently of how the model was generated. The proposed defense is built upon a novel reverse engineering approach that can directly extract backdoor functionality of a given backdoored model to a backdoor expert model. The approach is straightforward -- finetuning the backdoored model over a small set of intentionally mislabeled clean samples, such that it unlearns the normal functionality while still preserving the backdoor functionality, and thus resulting in a model (dubbed a backdoor expert model) that can only recognize backdoor inputs. Based on the extracted backdoor expert model, we show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference. Further augmented by an ensemble strategy with a finetuned auxiliary model, our defense, BaDExpert (Backdoor Input Detection with Backdoor Expert), effectively mitigates 16 SOTA backdoor attacks while minimally impacting clean utility. The effectiveness of BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet) across various model architectures (ResNet, VGG, MobileNetV2 and Vision Transformer).
    Federated Learning in Big Model Era: Domain-Specific Multimodal Large Models. (arXiv:2308.11217v3 [cs.LG] UPDATED)
    Multimodal data, which can comprehensively perceive and recognize the physical world, has become an essential path towards general artificial intelligence. However, multimodal large models trained on public datasets often underperform in specific industrial domains. This paper proposes a multimodal federated learning framework that enables multiple enterprises to utilize private domain data to collaboratively train large models for vertical domains, achieving intelligent services across scenarios. The authors discuss in-depth the strategic transformation of federated learning in terms of intelligence foundation and objectives in the era of big model, as well as the new challenges faced in heterogeneous data, model aggregation, performance and cost trade-off, data privacy, and incentive mechanism. The paper elaborates a case study of leading enterprises contributing multimodal data and expert knowledge to city safety operation management , including distributed deployment and efficient coordination of the federated learning platform, technical innovations on data quality improvement based on large model capabilities and efficient joint fine-tuning approaches. Preliminary experiments show that enterprises can enhance and accumulate intelligent capabilities through multimodal model federated learning, thereby jointly creating an smart city model that provides high-quality intelligent services covering energy infrastructure safety, residential community security, and urban operation management. The established federated learning cooperation ecosystem is expected to further aggregate industry, academia, and research resources, realize large models in multiple vertical domains, and promote the large-scale industrial application of artificial intelligence and cutting-edge research on multimodal federated learning.
    Conditional Kernel Imitation Learning for Continuous State Environments. (arXiv:2308.12573v1 [cs.LG])
    Imitation Learning (IL) is an important paradigm within the broader reinforcement learning (RL) methodology. Unlike most of RL, it does not assume availability of reward-feedback. Reward inference and shaping are known to be difficult and error-prone methods particularly when the demonstration data comes from human experts. Classical methods such as behavioral cloning and inverse reinforcement learning are highly sensitive to estimation errors, a problem that is particularly acute in continuous state space problems. Meanwhile, state-of-the-art IL algorithms convert behavioral policy learning problems into distribution-matching problems which often require additional online interaction data to be effective. In this paper, we consider the problem of imitation learning in continuous state space environments based solely on observed behavior, without access to transition dynamics information, reward structure, or, most importantly, any additional interactions with the environment. Our approach is based on the Markov balance equation and introduces a novel conditional kernel density estimation-based imitation learning framework. It involves estimating the environment's transition dynamics using conditional kernel density estimators and seeks to satisfy the probabilistic balance equations for the environment. We establish that our estimators satisfy basic asymptotic consistency requirements. Through a series of numerical experiments on continuous state benchmark environments, we show consistently superior empirical performance over many state-of-the-art IL algorithms.
    Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature. (arXiv:2308.12420v1 [cs.IR])
    Distributed Ledger Technologies (DLTs) have rapidly evolved, necessitating comprehensive insights into their diverse components. However, a systematic literature review that emphasizes the Environmental, Sustainability, and Governance (ESG) components of DLT remains lacking. To bridge this gap, we selected 107 seed papers to build a citation network of 63,083 references and refined it to a corpus of 24,539 publications for analysis. Then, we labeled the named entities in 46 papers according to twelve top-level categories derived from an established technology taxonomy and enhanced the taxonomy by pinpointing DLT's ESG elements. Leveraging transformer-based language models, we fine-tuned a pre-trained language model for a Named Entity Recognition (NER) task using our labeled dataset. We used our fine-tuned language model to distill the corpus to 505 key papers, facilitating a literature review via named entities and temporal graph analysis on DLT evolution in the context of ESG. Our contributions are a methodology to conduct a machine learning-driven systematic literature review in the DLT field, placing a special emphasis on ESG aspects. Furthermore, we present a first-of-its-kind NER dataset, composed of 54,808 named entities, designed for DLT and ESG-related explorations.
    Learning Only On Boundaries: a Physics-Informed Neural operator for Solving Parametric Partial Differential Equations in Complex Geometries. (arXiv:2308.12939v1 [cs.LG])
    Recently deep learning surrogates and neural operators have shown promise in solving partial differential equations (PDEs). However, they often require a large amount of training data and are limited to bounded domains. In this work, we present a novel physics-informed neural operator method to solve parametrized boundary value problems without labeled data. By reformulating the PDEs into boundary integral equations (BIEs), we can train the operator network solely on the boundary of the domain. This approach reduces the number of required sample points from $O(N^d)$ to $O(N^{d-1})$, where $d$ is the domain's dimension, leading to a significant acceleration of the training process. Additionally, our method can handle unbounded problems, which are unattainable for existing physics-informed neural networks (PINNs) and neural operators. Our numerical experiments show the effectiveness of parametrized complex geometries and unbounded problems.
    Anderson Acceleration For Bioinformatics-Based Machine Learning. (arXiv:2302.00347v2 [cs.LG] UPDATED)
    Anderson acceleration (AA) is a well-known method for accelerating the convergence of iterative algorithms, with applications in various fields including deep learning and optimization. Despite its popularity in these areas, the effectiveness of AA in classical machine learning classifiers has not been thoroughly studied. Tabular data, in particular, presents a unique challenge for deep learning models, and classical machine learning models are known to perform better in these scenarios. However, the convergence analysis of these models has received limited attention. To address this gap in research, we implement a support vector machine (SVM) classifier variant that incorporates AA to speed up convergence. We evaluate the performance of our SVM with and without Anderson acceleration on several datasets from the biology domain and demonstrate that the use of AA significantly improves convergence and reduces the training loss as the number of iterations increases. Our findings provide a promising perspective on the potential of Anderson acceleration in the training of simple machine learning classifiers and underscore the importance of further research in this area. By showing the effectiveness of AA in this setting, we aim to inspire more studies that explore the applications of AA in classical machine learning.
    Quantized Radio Map Estimation Using Tensor and Deep Generative Models. (arXiv:2303.01770v2 [eess.SP] UPDATED)
    Spectrum cartography (SC), also known as radio map estimation (RME), aims at crafting multi-domain (e.g., frequency and space) radio power propagation maps from limited sensor measurements. While early methods often lacked theoretical support, recent works have demonstrated that radio maps can be provably recovered using low-dimensional models -- such as the block-term tensor decomposition (BTD) model and certain deep generative models (DGMs) -- of the high-dimensional multi-domain radio signals. However, these existing provable SC approaches assume that sensors send real-valued (full-resolution) measurements to the fusion center, which is unrealistic. This work puts forth a quantized SC framework that generalizes the BTD and DGM-based SC to scenarios where heavily quantized sensor measurements are used. A maximum likelihood estimation (MLE)-based SC framework under a Gaussian quantizer is proposed. Recoverability of the radio map using the MLE criterion are characterized under realistic conditions, e.g., imperfect radio map modeling and noisy measurements. Simulations and real-data experiments are used to showcase the effectiveness of the proposed approach.
    Riemannian Hamiltonian methods for min-max optimization on manifolds. (arXiv:2204.11418v3 [math.OC] UPDATED)
    In this paper, we study min-max optimization problems on Riemannian manifolds. We introduce a Riemannian Hamiltonian function, minimization of which serves as a proxy for solving the original min-max problems. Under the Riemannian Polyak--{\L}ojasiewicz condition on the Hamiltonian function, its minimizer corresponds to the desired min-max saddle point. We also provide cases where this condition is satisfied. For geodesic-bilinear optimization in particular, solving the proxy problem leads to the correct search direction towards global optimality, which becomes challenging with the min-max formulation. To minimize the Hamiltonian function, we propose Riemannian Hamiltonian methods (RHM) and present their convergence analyses. We extend RHM to include consensus regularization and to the stochastic setting. We illustrate the efficacy of the proposed RHM in applications such as subspace robust Wasserstein distance, robust training of neural networks, and generative adversarial networks.
    IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency. (arXiv:2308.12871v1 [cs.DC])
    Efficiently optimizing multi-model inference pipelines for fast, accurate, and cost-effective inference is a crucial challenge in ML production systems, given their tight end-to-end latency requirements. To simplify the exploration of the vast and intricate trade-off space of accuracy and cost in inference pipelines, providers frequently opt to consider one of them. However, the challenge lies in reconciling accuracy and cost trade-offs. To address this challenge and propose a solution to efficiently manage model variants in inference pipelines, we present IPA, an online deep-learning Inference Pipeline Adaptation system that efficiently leverages model variants for each deep learning task. Model variants are different versions of pre-trained models for the same deep learning task with variations in resource requirements, latency, and accuracy. IPA dynamically configures batch size, replication, and model variants to optimize accuracy, minimize costs, and meet user-defined latency SLAs using Integer Programming. It supports multi-objective settings for achieving different trade-offs between accuracy and cost objectives while remaining adaptable to varying workloads and dynamic traffic patterns. Extensive experiments on a Kubernetes implementation with five real-world inference pipelines demonstrate that IPA improves normalized accuracy by up to 35% with a minimal cost increase of less than 5%.
    Renormalizing Diffusion Models. (arXiv:2308.12355v1 [hep-th])
    We explain how to use diffusion models to learn inverse renormalization group flows of statistical and quantum field theories. Diffusion models are a class of machine learning models which have been used to generate samples from complex distributions, such as the distribution of natural images, by learning the inverse process to a diffusion process which adds noise to the data until the distribution of the data is pure noise. Nonperturbative renormalization group schemes can naturally be written as diffusion processes in the space of fields. We combine these observations in a concrete framework for building ML-based models for studying field theories, in which the models learn the inverse process to an explicitly-specified renormalization group scheme. We detail how these models define a class of adaptive bridge (or parallel tempering) samplers for lattice field theory. Because renormalization group schemes have a physical meaning, we provide explicit prescriptions for how to compare results derived from models associated to several different renormalization group schemes of interest. We also explain how to use diffusion models in a variational method to find ground states of quantum systems. We apply some of our methods to numerically find RG flows of interacting statistical field theories. From the perspective of machine learning, our work provides an interpretation of multiscale diffusion models, and gives physically-inspired suggestions for diffusion models which should have novel properties.
    Predicting Drug Solubility Using Different Machine Learning Methods -- Linear Regression Model with Extracted Chemical Features vs Graph Convolutional Neural Network. (arXiv:2308.12325v1 [q-bio.QM])
    Predicting the solubility of given molecules is an important task in the pharmaceutical industry, and consequently this is a well-studied topic. In this research, we revisited this problem with the advantage of modern computing resources. We applied two machine learning models, a linear regression model and a graph convolutional neural network model, on multiple experimental datasets. Both methods can make reasonable predictions while the GCNN model had the best performance. However, the current GCNN model is a black box, while feature importance analysis from the linear regression model offers more insights into the underlying chemical influences. Using the linear regression model, we show how each functional group affects the overall solubility. Ultimately, knowing how chemical structure influences chemical properties is crucial when designing new drugs. Future work should aim to combine the high performance of GCNNs with the interpretability of linear regression, unlocking new advances in next generation high throughput screening.
    FOSA: Full Information Maximum Likelihood (FIML) Optimized Self-Attention Imputation for Missing Data. (arXiv:2308.12388v1 [cs.LG])
    In data imputation, effectively addressing missing values is pivotal, especially in intricate datasets. This paper delves into the FIML Optimized Self-attention (FOSA) framework, an innovative approach that amalgamates the strengths of Full Information Maximum Likelihood (FIML) estimation with the capabilities of self-attention neural networks. Our methodology commences with an initial estimation of missing values via FIML, subsequently refining these estimates by leveraging the self-attention mechanism. Our comprehensive experiments on both simulated and real-world datasets underscore FOSA's pronounced advantages over traditional FIML techniques, encapsulating facets of accuracy, computational efficiency, and adaptability to diverse data structures. Intriguingly, even in scenarios where the Structural Equation Model (SEM) might be mis-specified, leading to suboptimal FIML estimates, the robust architecture of FOSA's self-attention component adeptly rectifies and optimizes the imputation outcomes. Our empirical tests reveal that FOSA consistently delivers commendable predictions, even in the face of up to 40% random missingness, highlighting its robustness and potential for wide-scale applications in data imputation.
    Towards Top-Down Automated Development in Limited Scopes: A Neuro-Symbolic Framework from Expressibles to Executables. (arXiv:2209.01566v4 [cs.SE] UPDATED)
    Deep code generation is a topic of deep learning for software engineering (DL4SE), which adopts neural models to generate code for the intended functions. Since end-to-end neural methods lack domain knowledge and software hierarchy awareness, they tend to perform poorly w.r.t project-level tasks. To systematically explore the potential improvements of code generation, we let it participate in the whole top-down development from \emph{expressibles} to \emph{executables}, which is possible in limited scopes. In the process, it benefits from massive samples, features, and knowledge. As the foundation, we suggest building a taxonomy on code data, namely code taxonomy, leveraging the categorization of code information. Moreover, we introduce a three-layer semantic pyramid (SP) to associate text data and code data. It identifies the information of different abstraction levels, and thus introduces the domain knowledge on development and reveals the hierarchy of software. Furthermore, we propose a semantic pyramid framework (SPF) as the approach, focusing on software of high modularity and low complexity. SPF divides the code generation process into stages and reserves spots for potential interactions. In addition, we conceived preliminary applications in software development to confirm the neuro-symbolic framework.
    Towards Realistic Unsupervised Fine-tuning with CLIP. (arXiv:2308.12919v1 [cs.CV])
    The emergence of vision-language models (VLMs), such as CLIP, has spurred a significant research effort towards their application for downstream supervised learning tasks. Although some previous studies have explored the unsupervised fine-tuning of CLIP, they often rely on prior knowledge in the form of class names associated with ground truth labels. In this paper, we delve into a realistic unsupervised fine-tuning scenario by assuming that the unlabeled data might contain out-of-distribution samples from unknown classes. Furthermore, we emphasize the importance of simultaneously enhancing out-of-distribution detection capabilities alongside the recognition of instances associated with predefined class labels. To tackle this problem, we present a simple, efficient, and effective fine-tuning approach called Universal Entropy Optimization (UEO). UEO leverages sample-level confidence to approximately minimize the conditional entropy of confident instances and maximize the marginal entropy of less confident instances. Apart from optimizing the textual prompts, UEO also incorporates optimization of channel-wise affine transformations within the visual branch of CLIP. Through extensive experiments conducted across 15 domains and 4 different types of prior knowledge, we demonstrate that UEO surpasses baseline methods in terms of both generalization and out-of-distribution detection.
    Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward. (arXiv:2308.12772v1 [cs.RO])
    Robot control using reinforcement learning has become popular, but its learning process generally terminates halfway through an episode for safety and time-saving reasons. This study addresses the problem of the most popular exception handling that temporal-difference (TD) learning performs at such termination. That is, by forcibly assuming zero value after termination, unintentionally implicit underestimation or overestimation occurs, depending on the reward design in the normal states. When the episode is terminated due to task failure, the failure may be highly valued with the unintentional overestimation, and the wrong policy may be acquired. Although this problem can be avoided by paying attention to the reward design, it is essential in practical use of TD learning to review the exception handling at termination. This paper therefore proposes a method to intentionally underestimate the value after termination to avoid learning failures due to the unintentional overestimation. In addition, the degree of underestimation is adjusted according to the degree of stationarity at termination, thereby preventing excessive exploration due to the intentional underestimation. Simulations and real robot experiments showed that the proposed method can stably obtain the optimal policies for various tasks and reward designs. https://youtu.be/AxXr8uFOe7M
    Augmenting Reinforcement Learning with Transformer-based Scene Representation Learning for Decision-making of Autonomous Driving. (arXiv:2208.12263v2 [cs.LG] UPDATED)
    Decision-making for urban autonomous driving is challenging due to the stochastic nature of interactive traffic participants and the complexity of road structures. Although reinforcement learning (RL)-based decision-making scheme is promising to handle urban driving scenarios, it suffers from low sample efficiency and poor adaptability. In this paper, we propose Scene-Rep Transformer to improve the RL decision-making capabilities with better scene representation encoding and sequential predictive latent distillation. Specifically, a multi-stage Transformer (MST) encoder is constructed to model not only the interaction awareness between the ego vehicle and its neighbors but also intention awareness between the agents and their candidate routes. A sequential latent Transformer (SLT) with self-supervised learning objectives is employed to distill the future predictive information into the latent scene representation, in order to reduce the exploration space and speed up training. The final decision-making module based on soft actor-critic (SAC) takes as input the refined latent scene representation from the Scene-Rep Transformer and outputs driving actions. The framework is validated in five challenging simulated urban scenarios with dense traffic, and its performance is manifested quantitatively by the substantial improvements in data efficiency and performance in terms of success rate, safety, and efficiency. The qualitative results reveal that our framework is able to extract the intentions of neighbor agents to help make decisions and deliver more diversified driving behaviors.
    An Efficient Distributed Multi-Agent Reinforcement Learning for EV Charging Network Control. (arXiv:2308.12921v1 [cs.MA])
    The increasing trend in adopting electric vehicles (EVs) will significantly impact the residential electricity demand, which results in an increased risk of transformer overload in the distribution grid. To mitigate such risks, there are urgent needs to develop effective EV charging controllers. Currently, the majority of the EV charge controllers are based on a centralized approach for managing individual EVs or a group of EVs. In this paper, we introduce a decentralized Multi-agent Reinforcement Learning (MARL) charging framework that prioritizes the preservation of privacy for EV owners. We employ the Centralized Training Decentralized Execution-Deep Deterministic Policy Gradient (CTDE-DDPG) scheme, which provides valuable information to users during training while maintaining privacy during execution. Our results demonstrate that the CTDE framework improves the performance of the charging network by reducing the network costs. Moreover, we show that the Peak-to-Average Ratio (PAR) of the total demand is reduced, which, in turn, reduces the risk of transformer overload during the peak hours.
    PruMUX: Augmenting Data Multiplexing with Model Compression. (arXiv:2305.14706v2 [cs.LG] UPDATED)
    As language models increase in size by the day, methods for efficient inference are critical to leveraging their capabilities for various applications. Prior work has investigated techniques like model pruning, knowledge distillation, and data multiplexing to increase model throughput without sacrificing accuracy. In this paper, we combine two such methods -- structured pruning and data multiplexing -- to compound the speedup gains obtained by either method. Our approach, PruMUX, obtains up to 7.5-29.5X throughput improvement over BERT-base model with accuracy threshold from 80% to 74%. We further study various combinations of parameters (such as sparsity and multiplexing factor) in the two techniques to provide a comprehensive analysis of the tradeoff between accuracy and throughput in the resulting models. We then propose Auto-PruMUX, a meta-level model that can predict the high-performance parameters for pruning and multiplexing given a desired accuracy loss budget, providing a practical method to leverage the combination effectively.
    UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023. (arXiv:2308.12526v1 [eess.AS])
    This report describes the UNISOUND submission for Track1 and Track2 of VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC 2023). We submit the same system on Track 1 and Track 2, which is trained with only VoxCeleb2-dev. Large-scale ResNet and RepVGG architectures are developed for the challenge. We propose a consistency-aware score calibration method, which leverages the stability of audio voiceprints in similarity score by a Consistency Measure Factor (CMF). CMF brings a huge performance boost in this challenge. Our final system is a fusion of six models and achieves the first place in Track 1 and second place in Track 2 of VoxSRC 2023. The minDCF of our submission is 0.0855 and the EER is 1.5880%.
    A temporally and spatially local spike-based backpropagation algorithm to enable training in hardware. (arXiv:2207.09755v2 [cs.NE] UPDATED)
    Spiking Neural Networks (SNNs) have emerged as a hardware efficient architecture for classification tasks. The challenge of spike-based encoding has been the lack of a universal training mechanism performed entirely using spikes. There have been several attempts to adopt the powerful backpropagation (BP) technique used in non-spiking artificial neural networks (ANN): (1) SNNs can be trained by externally computed numerical gradients. (2) A major advancement towards native spike-based learning has been the use of approximate Backpropagation using spike-time dependent plasticity (STDP) with phased forward/backward passes. However, the transfer of information between such phases for gradient and weight update calculation necessitates external memory and computational access. This is a challenge for standard neuromorphic hardware implementations. In this paper, we propose a stochastic SNN based Back-Prop (SSNN-BP) algorithm that utilizes a composite neuron to simultaneously compute the forward pass activations and backward pass gradients explicitly with spikes. Although signed gradient values are a challenge for spike-based representation, we tackle this by splitting the gradient signal into positive and negative streams. We show that our method approaches BP ANN baseline with sufficiently long spike-trains. Finally, we show that the well-performing softmax cross-entropy loss function can be implemented through inhibitory lateral connections enforcing a Winner Take All (WTA) rule. Our SNN with a 2-layer network shows excellent generalization through comparable performance to ANNs with equivalent architecture and regularization parameters on static image datasets like MNIST, Fashion-MNIST, Extended MNIST, and temporally encoded image datasets like Neuromorphic MNIST datasets. Thus, SSNN-BP enables BP compatible with purely spike-based neuromorphic hardware.
    Algorithmic progress in computer vision. (arXiv:2212.05153v4 [cs.CV] UPDATED)
    We investigate algorithmic progress in image classification on ImageNet, perhaps the most well-known test bed for computer vision. We estimate a model, informed by work on neural scaling laws, and infer a decomposition of progress into the scaling of compute, data, and algorithms. Using Shapley values to attribute performance improvements, we find that algorithmic improvements have been roughly as important as the scaling of compute for progress computer vision. Our estimates indicate that algorithmic innovations mostly take the form of compute-augmenting algorithmic advances (which enable researchers to get better performance from less compute), not data-augmenting algorithmic advances. We find that compute-augmenting algorithmic advances are made at a pace more than twice as fast as the rate usually associated with Moore's law. In particular, we estimate that compute-augmenting innovations halve compute requirements every nine months (95\% confidence interval: 4 to 25 months).
    Not Only Rewards But Also Constraints: Applications on Legged Robot Locomotion. (arXiv:2308.12517v1 [cs.RO])
    Several earlier studies have shown impressive control performance in complex robotic systems by designing the controller using a neural network and training it with model-free reinforcement learning. However, these outstanding controllers with natural motion style and high task performance are developed through extensive reward engineering, which is a highly laborious and time-consuming process of designing numerous reward terms and determining suitable reward coefficients. In this work, we propose a novel reinforcement learning framework for training neural network controllers for complex robotic systems consisting of both rewards and constraints. To let the engineers appropriately reflect their intent to constraints and handle them with minimal computation overhead, two constraint types and an efficient policy optimization algorithm are suggested. The learning framework is applied to train locomotion controllers for several legged robots with different morphology and physical attributes to traverse challenging terrains. Extensive simulation and real-world experiments demonstrate that performant controllers can be trained with significantly less reward engineering, by tuning only a single reward coefficient. Furthermore, a more straightforward and intuitive engineering process can be utilized, thanks to the interpretability and generalizability of constraints. The summary video is available at https://youtu.be/KAlm3yskhvM.
    On the Generalization of PINNs outside the training domain and the Hyperparameters influencing it. (arXiv:2302.07557v2 [cs.LG] UPDATED)
    Physics-Informed Neural Networks (PINNs) are Neural Network architectures trained to emulate solutions of differential equations without the necessity of solution data. They are currently ubiquitous in the scientific literature due to their flexible and promising settings. However, very little of the available research provides practical studies that aim for a better quantitative understanding of such architecture and its functioning. In this paper, we perform an empirical analysis of the behavior of PINN predictions outside their training domain. The primary goal is to investigate the scenarios in which a PINN can provide consistent predictions outside the training area. Thereinafter, we assess whether the algorithmic setup of PINNs can influence their potential for generalization and showcase the respective effect on the prediction. The results obtained in this study returns insightful and at times counterintuitive perspectives which can be highly relevant for architectures which combines PINNs with domain decomposition and/or adaptive training strategies.
    Single-shot Bayesian approximation for neural networks. (arXiv:2308.12785v1 [cs.LG])
    Deep neural networks (NNs) are known for their high-prediction performances. However, NNs are prone to yield unreliable predictions when encountering completely new situations without indicating their uncertainty. Bayesian variants of NNs (BNNs), such as Monte Carlo (MC) dropout BNNs, do provide uncertainty measures and simultaneously increase the prediction performance. The only disadvantage of BNNs is their higher computation time during test time because they rely on a sampling approach. Here we present a single-shot MC dropout approximation that preserves the advantages of BNNs while being as fast as NNs. Our approach is based on moment propagation (MP) and allows to analytically approximate the expected value and the variance of the MC dropout signal for commonly used layers in NNs, i.e. convolution, max pooling, dense, softmax, and dropout layers. The MP approach can convert an NN into a BNN without re-training given the NN has been trained with standard dropout. We evaluate our approach on different benchmark datasets and a simulated toy example in a classification and regression setting. We demonstrate that our single-shot MC dropout approximation resembles the point estimate and the uncertainty estimate of the predictive distribution that is achieved with an MC approach, while being fast enough for real-time deployments of BNNs. We show that using part of the saved time to combine our MP approach with deep ensemble techniques does further improve the uncertainty measures.
    Trustworthy Representation Learning Across Domains. (arXiv:2308.12315v1 [cs.LG])
    As AI systems have obtained significant performance to be deployed widely in our daily live and human society, people both enjoy the benefits brought by these technologies and suffer many social issues induced by these systems. To make AI systems good enough and trustworthy, plenty of researches have been done to build guidelines for trustworthy AI systems. Machine learning is one of the most important parts for AI systems and representation learning is the fundamental technology in machine learning. How to make the representation learning trustworthy in real-world application, e.g., cross domain scenarios, is very valuable and necessary for both machine learning and AI system fields. Inspired by the concepts in trustworthy AI, we proposed the first trustworthy representation learning across domains framework which includes four concepts, i.e, robustness, privacy, fairness, and explainability, to give a comprehensive literature review on this research direction. Specifically, we first introduce the details of the proposed trustworthy framework for representation learning across domains. Second, we provide basic notions and comprehensively summarize existing methods for the trustworthy framework from four concepts. Finally, we conclude this survey with insights and discussions on future research directions.
    Optimizing Neural Network Scale for ECG Classification. (arXiv:2308.12492v1 [cs.LG])
    We study scaling convolutional neural networks (CNNs), specifically targeting Residual neural networks (ResNet), for analyzing electrocardiograms (ECGs). Although ECG signals are time-series data, CNN-based models have been shown to outperform other neural networks with different architectures in ECG analysis. However, most previous studies in ECG analysis have overlooked the importance of network scaling optimization, which significantly improves performance. We explored and demonstrated an efficient approach to scale ResNet by examining the effects of crucial parameters, including layer depth, the number of channels, and the convolution kernel size. Through extensive experiments, we found that a shallower network, a larger number of channels, and smaller kernel sizes result in better performance for ECG classifications. The optimal network scale might differ depending on the target task, but our findings provide insight into obtaining more efficient and accurate models with fewer computing resources or less time. In practice, we demonstrate that a narrower search space based on our findings leads to higher performance.
    Test-Time Adaptation for Visual Document Understanding. (arXiv:2206.07240v2 [cs.CV] UPDATED)
    For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document data. DocTTA leverages cross-modality self-supervised learning via masked visual language modeling, as well as pseudo labeling to adapt models learned on a \textit{source} domain to an unlabeled \textit{target} domain at test time. We introduce new benchmarks using existing public datasets for various VDU tasks, including entity recognition, key-value extraction, and document visual question answering. DocTTA shows significant improvements on these compared to the source model performance, up to 1.89\% in (F1 score), 3.43\% (F1 score), and 17.68\% (ANLS score), respectively. Our benchmark datasets are available at \url{https://saynaebrahimi.github.io/DocTTA.html}.
    Machine Learning Small Molecule Properties in Drug Discovery. (arXiv:2308.12354v1 [q-bio.BM])
    Machine learning (ML) is a promising approach for predicting small molecule properties in drug discovery. Here, we provide a comprehensive overview of various ML methods introduced for this purpose in recent years. We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss existing popular datasets and molecular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks. We highlight also challenges of predicting and optimizing multiple properties during hit-to-lead and lead optimization stages of drug discovery and explore briefly possible multi-objective optimization techniques that can be used to balance diverse properties while optimizing lead candidates. Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed. Overall, this review provides insights into the landscape of ML models for small molecule property predictions in drug discovery. So far, there are multiple diverse approaches, but their performances are often comparable. Neural networks, while more flexible, do not always outperform simpler models. This shows that the availability of high-quality training data remains crucial for training accurate models and there is a need for standardized benchmarks, additional performance metrics, and best practices to enable richer comparisons between the different techniques and models that can shed a better light on the differences between the many techniques.
    Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints. (arXiv:2308.12680v1 [cs.LG])
    We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduce six slave models with distinguished merits to generate diversified samples well balancing rewards and constraints as well as efficiency. Moreover, we propose teacher learning based optimization and the policy co-training technique to boost the performance of the multiple slave models. The master model then collects the elite samples provided by the slave models and selects the best sample estimated by a neural contextual UCB-based network to make a decision with a trade-off between exploration and exploitation. Thanks to the elaborate design of slave models, the co-training mechanism among slave models, and the novel interactions between the master and slave models, our approach significantly surpasses existing state-of-the-art algorithms in both synthetic and real datasets for recommendation tasks. The code is available at: \url{https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits}.
    Farm-wide virtual load monitoring for offshore wind structures via Bayesian neural networks. (arXiv:2211.00642v2 [cs.LG] UPDATED)
    Offshore wind structures are subject to deterioration mechanisms throughout their operational lifetime. Even if the deterioration evolution of structural elements can be estimated through physics-based deterioration models, the uncertainties involved in the process hurdle the selection of lifecycle management decisions. In this scenario, the collection of relevant information through an efficient monitoring system enables the reduction of uncertainties, ultimately driving more optimal lifecycle decisions. However, a full monitoring instrumentation implemented on all wind turbines in a farm might become unfeasible due to practical and economical constraints. Besides, certain load monitoring systems often become defective after a few years of marine environment exposure. Addressing the aforementioned concerns, a farm-wide virtual load monitoring scheme directed by a fleet-leader wind turbine offers an attractive solution. Fetched with data retrieved from a fully-instrumented wind turbine, a model can be trained and then deployed, thus yielding load predictions of non-fully monitored wind turbines, from which only standard data remains available. In this paper, we propose a virtual load monitoring framework formulated via Bayesian neural networks (BNNs) and we provide relevant implementation details needed for the construction, training, and deployment of BNN data-based virtual monitoring models. As opposed to their deterministic counterparts, BNNs intrinsically announce the uncertainties associated with generated load predictions and allow to detect inaccurate load estimations generated for non-fully monitored wind turbines. The proposed virtual load monitoring is thoroughly tested through an experimental campaign in an operational offshore wind farm and the results demonstrate the effectiveness of BNN models for fleet-leader-based farm-wide virtual monitoring.
    Efficient-Adam: Communication-Efficient Distributed Adam. (arXiv:2205.14473v2 [cs.LG] UPDATED)
    Distributed adaptive stochastic gradient methods have been widely used for large-scale nonconvex optimization, such as training deep learning models. However, their communication complexity on finding $\varepsilon$-stationary points has rarely been analyzed in the nonconvex setting. In this work, we present a novel communication-efficient distributed Adam in the parameter-server model for stochastic nonconvex optimization, dubbed {\em Efficient-Adam}. Specifically, we incorporate a two-way quantization scheme into Efficient-Adam to reduce the communication cost between the workers and server. Simultaneously, we adopt a two-way error feedback strategy to reduce the biases caused by the two-way quantization on both the server and workers, respectively. In addition, we establish the iteration complexity for the proposed Efficient-Adam with a class of quantization operators, and further characterize its communication complexity between the server and workers when an $\varepsilon$-stationary point is achieved. Finally, we apply Efficient-Adam to solve a toy stochastic convex optimization problem and train deep learning models on real-world vision and language tasks. Extensive experiments together with a theoretical guarantee justify the merits of Efficient Adam.
    Graph Neural Stochastic Differential Equations. (arXiv:2308.12316v1 [cs.LG])
    We present a novel model Graph Neural Stochastic Differential Equations (Graph Neural SDEs). This technique enhances the Graph Neural Ordinary Differential Equations (Graph Neural ODEs) by embedding randomness into data representation using Brownian motion. This inclusion allows for the assessment of prediction uncertainty, a crucial aspect frequently missed in current models. In our framework, we spotlight the \textit{Latent Graph Neural SDE} variant, demonstrating its effectiveness. Through empirical studies, we find that Latent Graph Neural SDEs surpass conventional models like Graph Convolutional Networks and Graph Neural ODEs, especially in confidence prediction, making them superior in handling out-of-distribution detection across both static and spatio-temporal contexts.
    SafeAR: Towards Safer Algorithmic Recourse by Risk-Aware Policies. (arXiv:2308.12367v1 [cs.LG])
    With the growing use of machine learning (ML) models in critical domains such as finance and healthcare, the need to offer recourse for those adversely affected by the decisions of ML models has become more important; individuals ought to be provided with recommendations on actions to take for improving their situation and thus receive a favorable decision. Prior work on sequential algorithmic recourse -- which recommends a series of changes -- focuses on action feasibility and uses the proximity of feature changes to determine action costs. However, the uncertainties of feature changes and the risk of higher than average costs in recourse have not been considered. It is undesirable if a recourse could (with some probability) result in a worse situation from which recovery requires an extremely high cost. It is essential to incorporate risks when computing and evaluating recourse. We call the recourse computed with such risk considerations as Safer Algorithmic Recourse (SafeAR). The objective is to empower people to choose a recourse based on their risk tolerance. In this work, we discuss and show how existing recourse desiderata can fail to capture the risk of higher costs. We present a method to compute recourse policies that consider variability in cost and connect algorithmic recourse literature with risk-sensitive reinforcement learning. We also adopt measures ``Value at Risk'' and ``Conditional Value at Risk'' from the financial literature to summarize risk concisely. We apply our method to two real-world datasets and compare policies with different levels of risk-aversion using risk measures and recourse desiderata (sparsity and proximity).
  • Open

    Unifying Gradients to Improve Real-world Robustness for Deep Networks. (arXiv:2208.06228v2 [stat.ML] UPDATED)
    The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i.e., whether a DNN resists black-box adversarial attacks, among which score-based query attacks (SQAs) are most threatening since they can effectively hurt a victim network with the only access to model outputs. Defending against SQAs requires a slight but artful variation of outputs due to the service purpose for users, who share the same output information with SQAs. In this paper, we propose a real-world defense by Unifying Gradients (UniG) of different data so that SQAs could only probe a much weaker attack direction that is similar for different samples. Since such universal attack perturbations have been validated as less aggressive than the input-specific perturbations, UniG protects real-world DNNs by indicating attackers a twisted and less informative attack direction. We implement UniG efficiently by a Hadamard product module which is plug-and-play. According to extensive experiments on 5 SQAs, 2 adaptive attacks and 7 defense baselines, UniG significantly improves real-world robustness without hurting clean accuracy on CIFAR10 and ImageNet. For instance, UniG maintains a model of 77.80% accuracy under 2500-query Square attack while the state-of-the-art adversarially-trained model only has 67.34% on CIFAR10. Simultaneously, UniG outperforms all compared baselines in terms of clean accuracy and achieves the smallest modification of the model output. The code is released at https://github.com/snowien/UniG-pytorch.
    Prediction without Preclusion: Recourse Verification with Reachable Sets. (arXiv:2308.12820v1 [cs.LG])
    Machine learning models are often used to decide who will receive a loan, a job interview, or a public benefit. Standard techniques to build these models use features about people but overlook their actionability. In turn, models can assign predictions that are fixed, meaning that consumers who are denied loans, interviews, or benefits may be permanently locked out from access to credit, employment, or assistance. In this work, we introduce a formal testing procedure to flag models that assign fixed predictions that we call recourse verification. We develop machinery to reliably determine if a given model can provide recourse to its decision subjects from a set of user-specified actionability constraints. We demonstrate how our tools can ensure recourse and adversarial robustness in real-world datasets and use them to study the infeasibility of recourse in real-world lending datasets. Our results highlight how models can inadvertently assign fixed predictions that permanently bar access, and we provide tools to design algorithms that account for actionability when developing models.
    Multi-fidelity Fourier Neural Operator for Fast Modeling of Large-Scale Geological Carbon Storage. (arXiv:2308.09113v2 [stat.ML] UPDATED)
    Deep learning-based surrogate models have been widely applied in geological carbon storage (GCS) problems to accelerate the prediction of reservoir pressure and CO2 plume migration. Large amounts of data from physics-based numerical simulators are required to train a model to accurately predict the complex physical behaviors associated with this process. In practice, the available training data are always limited in large-scale 3D problems due to the high computational cost. Therefore, we propose to use a multi-fidelity Fourier Neural Operator to solve large-scale GCS problems with more affordable multi-fidelity training datasets. The Fourier Neural Operator has a desirable grid-invariant property, which simplifies the transfer learning procedure between datasets with different discretization. We first test the model efficacy on a GCS reservoir model being discretized into 110k grid cells. The multi-fidelity model can predict with accuracy comparable to a high-fidelity model trained with the same amount of high-fidelity data with 81% less data generation costs. We further test the generalizability of the multi-fidelity model on a same reservoir model with a finer discretization of 1 million grid cells. This case was made more challenging by employing high-fidelity and low-fidelity datasets generated by different geostatistical models and reservoir simulators. We observe that the multi-fidelity FNO model can predict pressure fields with reasonable accuracy even when the high-fidelity data are extremely limited.
    A Greedy Approach for Offering to Telecom Subscribers. (arXiv:2308.12606v1 [stat.ML])
    Customer retention or churn prevention is a challenging task of a telecom operator. One of the effective approaches is to offer some attractive incentive or additional services or money to the subscribers for keeping them engaged and make sure they stay in the operator's network for longer time. Often, operators allocate certain amount of monetary budget to carry out the offer campaign. The difficult part of this campaign is the selection of a set of customers from a large subscriber-base and deciding the amount that should be offered to an individual so that operator's objective is achieved. There may be multiple objectives (e.g., maximizing revenue, minimizing number of churns) for selection of subscriber and selection of an offer to the selected subscriber. Apart from monetary benefit, offers may include additional data, SMS, hots-spot tethering, and many more. This problem is known as offer optimization. In this paper, we propose a novel combinatorial algorithm for solving offer optimization under heterogeneous offers by maximizing expected revenue under the scenario of subscriber churn, which is, in general, seen in telecom domain. The proposed algorithm is efficient and accurate even for a very large subscriber-base.
    Demographic Parity Constrained Minimax Optimal Regression under Linear Model. (arXiv:2206.11546v3 [math.ST] UPDATED)
    We explore the minimax optimal error associated with a demographic parity-constrained regression problem within the context of a linear model. Our proposed model encompasses a broader range of discriminatory bias sources compared to the model presented by Chzhen and Schreuder (2022). Our analysis reveals that the minimax optimal error for the demographic parity-constrained regression problem under our model is characterized by $\Theta(\frac{dM}{n})$, where $n$ denotes the sample size, $d$ represents the dimensionality, and $M$ signifies the number of demographic groups arising from sensitive attributes. Moreover, we demonstrate that the minimax error increases in conjunction with a larger bias present in the model.
    Single-shot Bayesian approximation for neural networks. (arXiv:2308.12785v1 [cs.LG])
    Deep neural networks (NNs) are known for their high-prediction performances. However, NNs are prone to yield unreliable predictions when encountering completely new situations without indicating their uncertainty. Bayesian variants of NNs (BNNs), such as Monte Carlo (MC) dropout BNNs, do provide uncertainty measures and simultaneously increase the prediction performance. The only disadvantage of BNNs is their higher computation time during test time because they rely on a sampling approach. Here we present a single-shot MC dropout approximation that preserves the advantages of BNNs while being as fast as NNs. Our approach is based on moment propagation (MP) and allows to analytically approximate the expected value and the variance of the MC dropout signal for commonly used layers in NNs, i.e. convolution, max pooling, dense, softmax, and dropout layers. The MP approach can convert an NN into a BNN without re-training given the NN has been trained with standard dropout. We evaluate our approach on different benchmark datasets and a simulated toy example in a classification and regression setting. We demonstrate that our single-shot MC dropout approximation resembles the point estimate and the uncertainty estimate of the predictive distribution that is achieved with an MC approach, while being fast enough for real-time deployments of BNNs. We show that using part of the saved time to combine our MP approach with deep ensemble techniques does further improve the uncertainty measures.
    On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget. (arXiv:2308.12000v2 [stat.ML] UPDATED)
    We study the problem of best-arm identification with fixed budget in stochastic two-arm bandits with Bernoulli rewards. We prove that surprisingly, there is no algorithm that (i) performs as well as the algorithm sampling each arm equally (this algorithm is referred to as the {\it uniform sampling} algorithm) on all instances, and that (ii) strictly outperforms this algorithm on at least one instance. In short, there is no algorithm better than the uniform sampling algorithm. Towards this result, we introduce the natural class of {\it consistent} and {\it stable} algorithms, and show that any algorithm that performs as well as the uniform sampling algorithm on all instances belongs to this class. The proof is completed by deriving a lower bound on the error rate satisfied by any consistent and stable algorithm, and by showing that the uniform sampling algorithm matches this lower bound. Our results provide a solution to the two open problems presented in \cite{qin2022open}.
    Fat Shattering, Joint Measurability, and PAC Learnability of POVM Hypothesis Classes. (arXiv:2308.12304v1 [stat.ML])
    We characterize learnability for quantum measurement classes by establishing matching necessary and sufficient conditions for their PAC learnability, along with corresponding sample complexity bounds, in the setting where the learner is given access only to prepared quantum states. We first probe the results from previous works on this setting. We show that the empirical risk defined in previous works and matching the definition in the classical theory fails to satisfy the uniform convergence property enjoyed in the classical setting for some learnable classes. Moreover, we show that VC dimension generalization upper bounds in previous work are frequently infinite, even for finite-dimensional POVM classes. To surmount the failure of the standard ERM to satisfy uniform convergence, we define a new learning rule -- denoised ERM. We show this to be a universal learning rule for POVM and probabilistically observed concept classes, and the condition for it to satisfy uniform convergence is finite fat shattering dimension of the class. We give quantitative sample complexity upper and lower bounds for learnability in terms of finite fat-shattering dimension and a notion of approximate finite partitionability into approximately jointly measurable subsets, which allow for sample reuse. We then show that finite fat shattering dimension implies finite coverability by approximately jointly measurable subsets, leading to our matching conditions. We also show that every measurement class defined on a finite-dimensional Hilbert space is PAC learnable. We illustrate our results on several example POVM classes.
    Exact Manifold Gaussian Variational Bayes. (arXiv:2210.14598v3 [stat.ML] UPDATED)
    We propose an optimization algorithm for Variational Inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for Gaussian Variational Inference that implicitly satisfies the positive definite constraint on the variational covariance matrix. Our Exact manifold Gaussian Variational Bayes (EMGVB) provides exact but simple update rules and is straightforward to implement. Due to its black-box nature, EMGVB stands as a ready-to-use solution for VI in complex models. Over five datasets, we empirically validate our feasible approach on different statistical, econometric, and deep learning models, discussing its performance with respect to baseline methods.
    An Intentional Forgetting-Driven Self-Healing Method For Deep Reinforcement Learning Systems. (arXiv:2308.12445v1 [cs.LG])
    Deep reinforcement learning (DRL) is increasingly applied in large-scale productions like Netflix and Facebook. As with most data-driven systems, DRL systems can exhibit undesirable behaviors due to environmental drifts, which often occur in constantly-changing production settings. Continual Learning (CL) is the inherent self-healing approach for adapting the DRL agent in response to the environment's conditions shifts. However, successive shifts of considerable magnitude may cause the production environment to drift from its original state. Recent studies have shown that these environmental drifts tend to drive CL into long, or even unsuccessful, healing cycles, which arise from inefficiencies such as catastrophic forgetting, warm-starting failure, and slow convergence. In this paper, we propose Dr. DRL, an effective self-healing approach for DRL systems that integrates a novel mechanism of intentional forgetting into vanilla CL to overcome its main issues. Dr. DRL deliberately erases the DRL system's minor behaviors to systematically prioritize the adaptation of the key problem-solving skills. Using well-established DRL algorithms, Dr. DRL is compared with vanilla CL on various drifted environments. Dr. DRL is able to reduce, on average, the healing time and fine-tuning episodes by, respectively, 18.74% and 17.72%. Dr. DRL successfully helps agents to adapt to 19.63% of drifted environments left unsolved by vanilla CL while maintaining and even enhancing by up to 45% the obtained rewards for drifted environments that are resolved by both approaches.
    Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy. (arXiv:2308.12553v1 [cs.LG])
    Common explanations for shortcut learning assume that the shortcut improves prediction under the training distribution but not in the test distribution. Thus, models trained via the typical gradient-based optimization of cross-entropy, which we call default-ERM, utilize the shortcut. However, even when the stable feature determines the label in the training distribution and the shortcut does not provide any additional information, like in perception tasks, default-ERM still exhibits shortcut learning. Why are such solutions preferred when the loss for default-ERM can be driven to zero using the stable feature alone? By studying a linear perception task, we show that default-ERM's preference for maximizing the margin leads to models that depend more on the shortcut than the stable feature, even without overparameterization. This insight suggests that default-ERM's implicit inductive bias towards max-margin is unsuitable for perception tasks. Instead, we develop an inductive bias toward uniform margins and show that this bias guarantees dependence only on the perfect stable feature in the linear perception task. We develop loss functions that encourage uniform-margin solutions, called margin control (MARG-CTRL). MARG-CTRL mitigates shortcut learning on a variety of vision and language tasks, showing that better inductive biases can remove the need for expensive two-stage shortcut-mitigating methods in perception tasks.
    Wasserstein Geodesic Generator for Conditional Distributions. (arXiv:2308.10145v2 [stat.ML] UPDATED)
    Generating samples given a specific label requires estimating conditional distributions. We derive a tractable upper bound of the Wasserstein distance between conditional distributions to lay the theoretical groundwork to learn conditional distributions. Based on this result, we propose a novel conditional generation algorithm where conditional distributions are fully characterized by a metric space defined by a statistical distance. We employ optimal transport theory to propose the Wasserstein geodesic generator, a new conditional generator that learns the Wasserstein geodesic. The proposed method learns both conditional distributions for observed domains and optimal transport maps between them. The conditional distributions given unobserved intermediate domains are on the Wasserstein geodesic between conditional distributions given two observed domain labels. Experiments on face images with light conditions as domain labels demonstrate the efficacy of the proposed method.
    A Data-Driven Approach to Morphogenesis under Structural Instability. (arXiv:2308.11846v1 [nlin.PS] CROSS LISTED)
    Morphological development into evolutionary patterns under structural instability is ubiquitous in living systems and often of vital importance for engineering structures. Here we propose a data-driven approach to understand and predict their spatiotemporal complexities. A machine-learning framework is proposed based on the physical modeling of morphogenesis triggered by internal or external forcing. Digital libraries of structural patterns are constructed from the simulation data, which are then used to recognize the abnormalities, predict their development, and assist in risk assessment and prognosis. The capabilities to identify the key bifurcation characteristics and predict the history-dependent development from the global and local features are demonstrated by examples of brain growth and aerospace structural design, which offer guidelines for disease diagnosis/prognosis and instability-tolerant design.
    Individual Privacy Accounting with Gaussian Differential Privacy. (arXiv:2209.15596v2 [cs.CR] UPDATED)
    Individual privacy accounting enables bounding differential privacy (DP) loss individually for each participant involved in the analysis. This can be informative as often the individual privacy losses are considerably smaller than those indicated by the DP bounds that are based on considering worst-case bounds at each data access. In order to account for the individual privacy losses in a principled manner, we need a privacy accountant for adaptive compositions of randomised mechanisms, where the loss incurred at a given data access is allowed to be smaller than the worst-case loss. This kind of analysis has been carried out for the R\'enyi differential privacy (RDP) by Feldman and Zrnic (2021), however not yet for the so-called optimal privacy accountants. We make first steps in this direction by providing a careful analysis using the Gaussian differential privacy which gives optimal bounds for the Gaussian mechanism, one of the most versatile DP mechanisms. This approach is based on determining a certain supermartingale for the hockey-stick divergence and on extending the R\'enyi divergence-based fully adaptive composition results by Feldman and Zrnic. We also consider measuring the individual $(\varepsilon,\delta)$-privacy losses using the so-called privacy loss distributions. With the help of the Blackwell theorem, we can then make use of the RDP analysis to construct an approximative individual $(\varepsilon,\delta)$-accountant.
    Conditional expectation using compactification operators. (arXiv:2306.10592v3 [stat.ML] UPDATED)
    The separate tasks of denoising, least squares expectation, and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactification tool, to set up the estimation problem as a linear inverse problem in a reproducing kernel Hilbert space. This equation is shown to have solutions that allow numerical approximation, thus guaranteeing the convergence of data-driven implementations. The overall technique is easy to implement, and their successful application to some real-world problems are also shown.
    Exact Bayesian Inference on Discrete Models via Probability Generating Functions: A Probabilistic Programming Approach. (arXiv:2305.17058v2 [cs.PL] UPDATED)
    We present an exact Bayesian inference method for discrete statistical models, which can find exact solutions to many discrete inference problems, even with infinite support and continuous priors. To express such models, we introduce a probabilistic programming language that supports discrete and continuous sampling, discrete observations, affine functions, (stochastic) branching, and conditioning on events. Our key tool is probability generating functions: they provide a compact closed-form representation of distributions that are definable by programs, thus enabling the exact computation of posterior probabilities, expectation, variance, and higher moments. Our inference method is provably correct, fully automated and uses automatic differentiation (specifically, Taylor polynomials), but does not require computer algebra. Our experiments show that its performance on a range of real-world examples is competitive with approximate Monte Carlo methods, while avoiding approximation errors.  ( 2 min )
    Geodesic Mode Connectivity. (arXiv:2308.12666v1 [cs.LG])
    Mode connectivity is a phenomenon where trained models are connected by a path of low loss. We reframe this in the context of Information Geometry, where neural networks are studied as spaces of parameterized distributions with curved geometry. We hypothesize that shortest paths in these spaces, known as geodesics, correspond to mode-connecting paths in the loss landscape. We propose an algorithm to approximate geodesics and demonstrate that they achieve mode connectivity.  ( 2 min )
    Riemannian Hamiltonian methods for min-max optimization on manifolds. (arXiv:2204.11418v3 [math.OC] UPDATED)
    In this paper, we study min-max optimization problems on Riemannian manifolds. We introduce a Riemannian Hamiltonian function, minimization of which serves as a proxy for solving the original min-max problems. Under the Riemannian Polyak--{\L}ojasiewicz condition on the Hamiltonian function, its minimizer corresponds to the desired min-max saddle point. We also provide cases where this condition is satisfied. For geodesic-bilinear optimization in particular, solving the proxy problem leads to the correct search direction towards global optimality, which becomes challenging with the min-max formulation. To minimize the Hamiltonian function, we propose Riemannian Hamiltonian methods (RHM) and present their convergence analyses. We extend RHM to include consensus regularization and to the stochastic setting. We illustrate the efficacy of the proposed RHM in applications such as subspace robust Wasserstein distance, robust training of neural networks, and generative adversarial networks.  ( 2 min )
    Variational Information Pursuit with Large Language and Multimodal Models for Interpretable Predictions. (arXiv:2308.12562v1 [cs.LG])
    Variational Information Pursuit (V-IP) is a framework for making interpretable predictions by design by sequentially selecting a short chain of task-relevant, user-defined and interpretable queries about the data that are most informative for the task. While this allows for built-in interpretability in predictive models, applying V-IP to any task requires data samples with dense concept-labeling by domain experts, limiting the application of V-IP to small-scale tasks where manual data annotation is feasible. In this work, we extend the V-IP framework with Foundational Models (FMs) to address this limitation. More specifically, we use a two-step process, by first leveraging Large Language Models (LLMs) to generate a sufficiently large candidate set of task-relevant interpretable concepts, then using Large Multimodal Models to annotate each data sample by semantic similarity with each concept in the generated concept set. While other interpretable-by-design frameworks such as Concept Bottleneck Models (CBMs) require an additional step of removing repetitive and non-discriminative concepts to have good interpretability and test performance, we mathematically and empirically justify that, with a sufficiently informative and task-relevant query (concept) set, the proposed FM+V-IP method does not require any type of concept filtering. In addition, we show that FM+V-IP with LLM generated concepts can achieve better test performance than V-IP with human annotated concepts, demonstrating the effectiveness of LLMs at generating efficient query sets. Finally, when compared to other interpretable-by-design frameworks such as CBMs, FM+V-IP can achieve competitive test performance using fewer number of concepts/queries in both cases with filtered or unfiltered concept sets.  ( 3 min )
    Near Optimal Adversarial Attack on UCB Bandits. (arXiv:2008.09312v6 [cs.LG] UPDATED)
    I study a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. I propose a novel attack strategy that manipulates a learner employing the UCB algorithm into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\widehat{O}(\sqrt{\log T})$, where $T$ is the number of rounds. I also prove the first lower bound on the cumulative attack cost. The lower bound matches the upper bound up to $O(\log \log T)$ factors, showing the proposed attack strategy to be near optimal.  ( 2 min )
    Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints. (arXiv:2308.12680v1 [cs.LG])
    We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduce six slave models with distinguished merits to generate diversified samples well balancing rewards and constraints as well as efficiency. Moreover, we propose teacher learning based optimization and the policy co-training technique to boost the performance of the multiple slave models. The master model then collects the elite samples provided by the slave models and selects the best sample estimated by a neural contextual UCB-based network to make a decision with a trade-off between exploration and exploitation. Thanks to the elaborate design of slave models, the co-training mechanism among slave models, and the novel interactions between the master and slave models, our approach significantly surpasses existing state-of-the-art algorithms in both synthetic and real datasets for recommendation tasks. The code is available at: \url{https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits}.  ( 2 min )
    Improving multiple-try Metropolis with local balancing. (arXiv:2211.11613v2 [stat.CO] UPDATED)
    Multiple-try Metropolis (MTM) is a popular Markov chain Monte Carlo method with the appealing feature of being amenable to parallel computing. At each iteration, it samples several candidates for the next state of the Markov chain and randomly selects one of them based on a weight function. The canonical weight function is proportional to the target density. We show both theoretically and empirically that this weight function induces pathological behaviours in high dimensions, especially during the convergence phase. We propose to instead use weight functions akin to the locally-balanced proposal distributions of Zanella (2020), thus yielding MTM algorithms that do not exhibit those pathological behaviours. To theoretically analyse these algorithms, we study the high-dimensional performance of ideal schemes that can be thought of as MTM algorithms which sample an infinite number of candidates at each iteration, as well as the discrepancy between such schemes and the MTM algorithms which sample a finite number of candidates. Our analysis unveils a strong distinction between the convergence and stationary phases: in the former, local balancing is crucial and effective to achieve fast convergence, while in the latter, the canonical and novel weight functions yield similar performance. Numerical experiments include an application in precision medicine involving a computationally-expensive forward model, which makes the use of parallel computing within MTM iterations beneficial.  ( 3 min )
    Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines. (arXiv:2308.12635v1 [cs.CL])
    This paper presents a set of industrial-grade text processing models for Hungarian that achieve near state-of-the-art performance while balancing resource efficiency and accuracy. Models have been implemented in the spaCy framework, extending the HuSpaCy toolkit with several improvements to its architecture. Compared to existing NLP tools for Hungarian, all of our pipelines feature all basic text processing steps including tokenization, sentence-boundary detection, part-of-speech tagging, morphological feature tagging, lemmatization, dependency parsing and named entity recognition with high accuracy and throughput. We thoroughly evaluated the proposed enhancements, compared the pipelines with state-of-the-art tools and demonstrated the competitive performance of the new models in all text preprocessing steps. All experiments are reproducible and the pipelines are freely available under a permissive license.  ( 2 min )
    Low-count Time Series Anomaly Detection. (arXiv:2308.12925v1 [cs.LG])
    Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types. Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios (when anomaly signatures are provably undetectable), and non-uniform performance (when average metrics are not representative of local behaviour). The time series anomaly detection community currently lacks explicit tooling and processes to model and reliably detect anomalies in these settings. We address this gap by introducing a novel generative procedure for creating benchmark datasets comprising of low-count time series with anomalous segments. Via a mixture of theoretical and empirical analysis, our work explains how widely-used algorithms struggle with the distribution overlap between normal and anomalous segments. In order to mitigate this shortcoming, we then leverage our findings to demonstrate how anomaly score smoothing consistently improves performance. The practical utility of our analysis and recommendation is validated on a real-world dataset containing sales data for retail stores.  ( 2 min )
    Interneurons accelerate learning dynamics in recurrent neural networks for statistical adaptation. (arXiv:2209.10634v2 [q-bio.NC] UPDATED)
    Early sensory systems in the brain rapidly adapt to fluctuating input statistics, which requires recurrent communication between neurons. Mechanistically, such recurrent communication is often indirect and mediated by local interneurons. In this work, we explore the computational benefits of mediating recurrent communication via interneurons compared with direct recurrent connections. To this end, we consider two mathematically tractable recurrent linear neural networks that statistically whiten their inputs -- one with direct recurrent connections and the other with interneurons that mediate recurrent communication. By analyzing the corresponding continuous synaptic dynamics and numerically simulating the networks, we show that the network with interneurons is more robust to initialization than the network with direct recurrent connections in the sense that the convergence time for the synaptic dynamics in the network with interneurons (resp. direct recurrent connections) scales logarithmically (resp. linearly) with the spectrum of their initialization. Our results suggest that interneurons are computationally useful for rapid adaptation to changing input statistics. Interestingly, the network with interneurons is an overparameterized solution of the whitening objective for the network with direct recurrent connections, so our results can be viewed as a recurrent linear neural network analogue of the implicit acceleration phenomenon observed in overparameterized feedforward linear neural networks.  ( 2 min )
    A multiobjective continuation method to compute the regularization path of deep neural networks. (arXiv:2308.12044v2 [cs.LG] UPDATED)
    Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. In machine learning approaches based on linear models, it is well known that there exists a connecting path between the sparsest solution in terms of the $\ell^1$ norm (i.e., zero weights) and the non-regularized solution, which is called the regularization path. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem. However, due to the non-smoothness of the $\ell^1$ norm and the high number of parameters, this approach is not very efficient from a computational perspective. To overcome this limitation, we present an algorithm that allows for the approximation of the entire Pareto front for the above-mentioned objectives in a very efficient manner. We present numerical examples using both deterministic and stochastic gradients. We furthermore demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.  ( 2 min )
    StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random. (arXiv:2205.04701v3 [cs.LG] UPDATED)
    In recommender systems, users always choose the favorite items to rate, which leads to data missing not at random and poses a great challenge for unbiased evaluation and learning of prediction models. Currently, the doubly robust (DR) methods have been widely studied and demonstrate superior performance. However, in this paper, we show that DR methods are unstable and have unbounded bias, variance, and generalization bounds to extremely small propensities. Moreover, the fact that DR relies more on extrapolation will lead to suboptimal performance. To address the above limitations while retaining double robustness, we propose a stabilized doubly robust (StableDR) learning approach with a weaker reliance on extrapolation. Theoretical analysis shows that StableDR has bounded bias, variance, and generalization error bound simultaneously under inaccurate imputed errors and arbitrarily small propensities. In addition, we propose a novel learning approach for StableDR that updates the imputation, propensity, and prediction models cyclically, achieving more stable and accurate predictions. Extensive experiments show that our approaches significantly outperform the existing methods.  ( 2 min )
    On the Consistency of Average Embeddings for Item Recommendation. (arXiv:2308.12767v1 [cs.IR])
    A prevalent practice in recommender systems consists of averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.  ( 2 min )

  • Open

    Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B
    submitted by /u/nickb [link] [comments]  ( 9 min )
    A Visual Introduction to Neural Networks
    submitted by /u/nickb [link] [comments]  ( 9 min )
  • Open

    [D] i need help in machine learning journey
    Hello guys, I'm a newbie in machine learning and I'm really confused right now about where to start my machine learning journey, i want to know what kind of programming language is best for me to begin with I have some knowledge of Python. I'm planning to dive in-depth into generative AI and recommendation systems and Machine learning in finance. i will be glad to get as much advice as I can get for me to progress in this journey. thanks submitted by /u/fikayomiayo1 [link] [comments]  ( 9 min )
    [D] what are current hottest topics for research?
    Hi, EE senior bachelor student here. Over past 1.5 year, I passed many general ML courses and did many projects with the main focus on CV and I'm currently learning Generative models (GAN right now). I have plans to start doing research with other people around the world after this summer and work on and publish some papers if possible. my question is, what are the current hottest topics for research? Diffusion models (in case of generative vision models)? LLMs? what else? submitted by /u/Neotod1 [link] [comments]  ( 9 min )
    [Discussion] Should religion-based workshops exist in ML conferences
    Over the years, ML conferences had a lot of workshops such as women in ML, LatinXAL etc. that are aimed at increasing the diversity in the ML community. I've always been supportive of these workshops as I've seen first-hand how some of them face obstacles just based on their gender or ethnicity. However, I recently saw a tweet for Muslim in ML workshop at NeurIPS and I am not sure how to feel about it. They say it's a workshop meant for "those who self-identify as Muslim, or work on research that address challenges faced by Muslims". I am not exactly sure what they mean by research that address challenges faced by Muslims. Over that, I don't think religion-based workshops in a science conference is a good idea. I think religion should be kept out of science, and I don't know if tomorrow n different religion based workshops are going to popup. Like I said, I'm not completely sure if I'd support such a workshop or not, but I'd love to hear what other folks in ML research community think about it. Before someone calls me Islamophobic, I'm talking about any religion-based workshop in general, not just Muslim in ML. I'd have made this post even if I saw a Christian in ML or Jews in ML workshop. ​ submitted by /u/lolillini [link] [comments]  ( 9 min )
    [P] Codenames Multi-Agent RL Competition Project
    We've been working on a competition to develop agents for Codenames. RL agents play games against human players, and both human and RL agents are compared using an ELO-like system. We're giving out compute credit and cash prizes to model developers and human players. We're sharing with the /r/MachineLearning community in case there's interest :) If you have feedback about the concept, or platform, or competition, we'd also love to hear it. https://playgroundrl.com/codenames submitted by /u/YodelingVeterinarian [link] [comments]  ( 9 min )
    [Discussion] Does anyone else feel like ML might be backing itself into a corner - far from GAI?
    I read A thousand brains by Jeff Hawkins and some of their papers last year. It made me think a lot more about whether the current road that much of AI is going down - huge LLM's, will actually result in a real breakthrough in terms of a more general AI. A model that can perform unsupervised online learning, work with any kind of input, and actually reason rather than predict (will chat GPT ever be able to count?). In the book, one of the things Jeff Hawkins touches on is that current ML architectures don't actually model the brain as closely as we thought, and that hierarchical structures arn't as important as thought and instead many individual models are used. This was worrying to read considering most ML models use many layers to function. I'm a compsci major that focused on ML but I wonder what more experienced and knowledgeable people think about the current direction things are going in? ​ submitted by /u/djdylex [link] [comments]  ( 9 min )
    [P] About internship project and need help
    I've joined a bootcamp and then selected to the workshop. But both online courses and workshop had lack of code practice so that I couldn't improve my coding skills. I've nearly 1 day to send them the github link and the read.md file. Is there any problem if I benefit (I mean copypasta) from chat gpt. I've been in a web development workshop of an unicorn company and one of our first lesson was using chat gpt effectively and since then I feel couraged enough to work with chat gpt while coding on my own and it is really efficently . Is there any problem occures if I use chat gpt in order to complete my project? submitted by /u/MistikPornoTapinagi [link] [comments]  ( 9 min )
    AI Outperforms Students in University Assignments [N]
    A recent study published in Scientific Reports has found that ChatGPT can match or even exceed the performance of students when answering assessment questions across a range of subjects. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/gpdbew668bkb1.jpg?width=1200&format=pjpg&auto=webp&s=db0ec21b9ed3b4e752f3f1769bc1f57a0cab3f8e Why this matters: AI is becoming a popular tool for students: The study found that 74% of students surveyed would use ChatGPT to help with their assignments. Educators view AI use as plagiarism: Despite its popularity among students, 70% of educators view the use of AI like ChatGPT in schoolwork as plagiarism. AI can outperform students in many courses: In the study, ChatGPT-generated answers achieved a similar or higher average grade than students in 12 out of 32 courses—with maths and economics being the only two disciplines where students consistently outperformed AI. ChatGPT’s performance review: Strong performance on factual knowledge questions: Unsprisingly, ChatGPT outperformed the students on questions requiring factual knowledge. Struggles with trick questions: The AI model struggled most where trick questions were included in the assignment. AI-text classifiers struggle to detect AI use: Current AI-text classifiers cannot reliably detect ChatGPT’s use in schoolwork. The main takeaway: Educational institutions need to adapt: These findings suggest that evaluating students through homework assignments may no longer serve its purpose in the age of AI. Need for academic integrity policies: Educational institutions need to craft appropriate academic integrity policies as a means of regulation. P.S. If you find this kind of analysis interesting, I write a free newsletter on AI and tech that you’d love. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [P] NLP tennis data task
    Made this post in rdatascience, but was wondering if anyone here could help I'm currently a data science apprentice so apologies if I come across as a bit naïve in this area. This project is solo and pro-bono but I don't want to submit low-quality work. Overall goal of the project: "Should [X Type] courts be introduced?" I'm working with tennis data of length 140 records, and have 3 free text columns (there is a lot more categorical columns but I don't have any issue with this) that I need to process. The key thing I'm trying to get at is to classify responses into coherent opinions such as "I think the acrylic courts are bad", or, " I think the club is too cliquey". I've read all the responses, since the data size isn't too big and most of the records were left incomplete: average 60%…  ( 10 min )
    [R] WavJourney: Compositional Audio Creation with Large Language Models - University of Surrey 2023
    Paper: https://arxiv.org/abs/2307.14335 Github: https://github.com/Audio-AGI/WavJourney Project Page: https://audio-agi.github.io/WavJourney_demopage/ Demo: https://huggingface.co/spaces/Audio-AGI/WavJourney Abstract: Large Language Models (LLMs) have shown great promise in integrating diverse expert models to tackle intricate language and vision tasks. Despite their significance in advancing the field of Artificial Intelligence Generated Content (AIGC), their potential in intelligent audio content creation remains unexplored. In this work, we tackle the problem of creating audio content with storylines encompassing speech, music, and sound effects, guided by text instructions. We present WavJourney, a system that leverages LLMs to connect various audio models for audio content ge…  ( 9 min )
    [D] Single RTX 4060 Ti 16GB vs two RTX 3060 12GB cards (same price)?
    am looking to add a new GPU to my PC, and would be doing some DL work. Currently I rely on free tier Colab and Kaggle GPU quotas. Should I add an RTX3060 12 GB now and add anathor RTX3060 12 GB down the line, or save up and go for RTX 4060Ti 16GB version. Both would cost roughly the same submitted by /u/DietzscheNostoevsky [link] [comments]  ( 9 min )
    [D] How important are the formatting guidelines for conferences during anonymous phase
    I am currently a grad student, just submitted my first paper to AAAI last week. I wrote my paper using Overleaf, and the link (with edit option) was shared with my supervisor. Few days before the deadline I was still editing my paper and my manuscript exceeded the 7-page limit. One day my supervisor checked my work and inserted \vspace{-xx} wherever applicable e.g. around Section titles, tables, figures; however, this command is specifically forbidden by AAAI and authors are actually not allowed to change the spacing manually. My supervisor was well-aware of this restriction but I understand my supervisor’s intention was so that i could squeeze all the contents and information within the page limit. I myself, however, prefer to follow guidelines so in the end i did not use any \vspace in my submitted PDF (only PDF is required in the anonymous phase but not the original .tex file). Another student under my supervisor’s supervision used \vspace A LOT throughout his/her whole paper, to the point it was easily noticeable by naked eyes. Also, at one point my supervisor suggested the student to put the table caption above the table, as it is more common (although AAAI said to put the caption below the table). Since this is my first experience of submitting to a conference, and that my supervisor has experience publishing at and supervising students for many ML/AI conferences e.g. Neurips, CVPR, ICML, I am just curious, how important are these formatting guidelines during the anonymous phase? Does it have any impact on the scores/accept-reject decision? Am i being too naive or “conservative”? Another one minor question. My supervisor changed the positioning of all my figures, tables, and algorithms to [tb!], which was to put them either at the top or at the bottom of the page, and said this is the norm in academia. Is it true? submitted by /u/butterJM [link] [comments]  ( 10 min )
    [D] Autonomous Driving Off-Roads
    Solving the puzzle of autonomous driving in off-road terrains is a complex task that only a handful of experts around the globe are taking on. First let's understand the complexity of the task: The Off-Road Challenge: When we talk about autonomous driving, it's easy to picture well-paved roads and orderly traffic. However, off-road driving introduces a whole new level of complexity. Imagine a vehicle making its way through uneven terrains, gravel paths, and unexpected obstacles. Off-road environments lack the predictability of urban streets, making the task of autonomous navigation a true puzzle. Sensors: LiDAR, radar, cameras, and GPS work together to capture the surroundings in real-time. But here's the catch: the data from these sensors isn't neatly packaged. It's raw and needs carefu…  ( 10 min )
    [R][P] Readability-optimized Comic Sans alternative using Machine Learning
    Modified Generative Adversarial Neural Network GitHub page: https://muxamilian.github.io/Robo99/ GitHub repo: https://github.com/muxamilian/Robo99 submitted by /u/muxamilian [link] [comments]  ( 9 min )
    [D] How can Elevenlabs return a response so quickly?
    AI based tools, like Elevenlabs for TTS, can return an API response with constructed audio in <1 second. How on earth do their models return so quickly? For comparison, TortoiseTTS returns the audio for a sentence in minimum 15 seconds. Obviously they have VC funding and hardware. They probably have slimmed down models, but the speed of their response is insane. submitted by /u/tommyk1210 [link] [comments]  ( 9 min )
    [P] EasyOCR alternative to translate text
    [P] I translated text on image using easyocr then put the text back on image same coridnatees. As you can see i have to deal with many different fonts,colourings etc.... Is there not an AI library or a new way to semantically understand all this information on picture? https://preview.redd.it/cnc0xryli8kb1.jpg?width=970&format=pjpg&auto=webp&s=86f17df0eeebb01083c8e2c7a3ca09d22671b322 https://preview.redd.it/lg0seqyli8kb1.jpg?width=970&format=pjpg&auto=webp&s=e9f7699ae7d7d5cd99070354cdd679d1f71b84d3 submitted by /u/fabrcoti [link] [comments]  ( 9 min )
    [D] Serverless Inference for Llama2
    Serverless Inference for Llama2 I am part of a small (startup like) organization and want to use a model to answer client requests but these should not be 24/7 so I started looking at serverless inference. I have been warned about cold start times since the desired latency is of about 1-5 sec. I am using a Llama2-7b-GPTQ model (quantized) and also experimenting with the 13b version. The model weights take about 10GB of memory. I still do not have much experience with any of this aws stuff. Do you think this is a good strategy? Would the costs be lower? What could be the average cold start time? The inference time of the model is within the desired time so cold start is my biggest fear. Thanks submitted by /u/MiNeves [link] [comments]  ( 9 min )
    [R] Using AI for Cyber Security thesis topic
    I am beginner and would like to use LLM (llama2) and train it with cyber security data. what can this project lead to is little bit uncertain and where i can get the datasets from. maybe someone can help me with this submitted by /u/confusedguy1395 [link] [comments]  ( 9 min )
    [D] Is it me or HuggingFace do TOO MANY things?
    Just entered the HuggingFace ecosystem and it's totally overwhelming. They have like 5 libraries, I don't know the difference between them, I don't know what I need, it's all very confusing. They should do a "Start here" page on the front of their website and do a high-level overview of EVERYTHING they do. Just felt like sharing my experience. Have a good day yall. submitted by /u/andi_cs1 [link] [comments]  ( 9 min )
    [D] Topic Modelling Reference
    can anyone recommend me what book to read if I want to learn topic modelling. TIA. submitted by /u/Fun_Ambition_5186 [link] [comments]  ( 9 min )
    [N] Introducing Code Llama: A New Era of AI-Driven Coding
    Meta has unveiled Code Llama, a state-of-the-art large language model (LLM) that generates code from text prompts, as reported on their blog. This revolutionary tool is set to transform the way developers work, making their workflows more efficient and lowering the barrier to entry for coding newcomers. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://i.redd.it/awqzhhl4f6kb1.gif Why this matters: Code Llama is a game-changer: It’s a code-specialized version of Llama 2, capable of generating code and natural language about code from both code and natural language prompts. It supports popular languages like Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash. It’s free for research and commercial use: Meta believes in an o…  ( 10 min )
    [D] NeurIPS 2023 Paper Reviews - Datasets and Benchmarks
    I saw a few reddit posts about the main track reviews and wanted to create a discussion post for the datasets and benchmarks. As a first time submitter, I'm curious if there are any different experiences between the main track and the datasets track. submitted by /u/notasketchyperson [link] [comments]  ( 9 min )
    Tech Giants Invest $235 Million in AI Startup Hugging Face [N]
    AI startup Hugging Face has recently secured a whopping $235 million in a Series D funding round, raising its valuation to an impressive $4.5 billion. This investment round saw participation from tech behemoths like Google, Amazon, Nvidia, and Salesforce. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/dr9z7hbuh5kb1.jpg?width=1440&format=pjpg&auto=webp&s=ff23521492e1276e838c6c11c35134271a005691 Why this matters: Hugging Face’s unique collaborative approach sets it apart: Unlike many AI startups that closely guard their models, Hugging Face provides a platform where developers can freely share code, models, and datasets. The company is committed to supporting developers: Hugging Face offers tools that facilitate th…  ( 10 min )
  • Open

    code llama
    submitted by /u/nicdunz [link] [comments]  ( 9 min )
    Just was curious how she would react, no politics just an experiment with AI. Before you hate know that Phaedra was featured on Fox News with Jesse Watters as shown in the 2nd photo 👀
    submitted by /u/Sonic_Improv [link] [comments]  ( 9 min )
    AI — weekly megathread!
    News provided by aibrews.com ​ Meta AI releases Code Llama, a large language model for coding that is built on top of Llama 2. Code Llama Code outperformed state-of-the-art publicly available LLMs on code tasks. It is free for research and commercial use. You can try it on Fireworks AI and Perplexity Labs [Details]. Meta AI released SeamlessM4T (Massive Multilingual Multimodal Machine Translation) - the first all-in-one, multilingual multimodal translation model. SeamlessM4T can perform multiple tasks across speech and text: speech-to-text, speech-to-speech, text-to-speech, text-to-text translation, and speech recognition. It supports 100 languages for input (speech + text), 100 languages for text output and 35 languages (plus English) for speech output [Details | Demo | Hugging Face …  ( 11 min )
    This video shows how AI used brain computer technology to helps Paralyzed women (Ann) giving her voice back
    Ann is collaborating with researchers from UC San Francisco and UC Berkeley to pioneer revolutionary brain-computer technology. This breakthrough could empower people like Ann to communicate naturally through digital avatars, synthesizing speech and facial expressions from brain signals, a groundbreaking achievement. Source: (UCSF) Video source: www.ucsf.edu submitted by /u/inception247 [link] [comments]  ( 9 min )
    AI for removing watermarks?
    I have a good amount of personal videos with watermarks in them. What AI can I use to remove the watermarks from the videos? I've tried a few sites but most of them just blur the watermark which I can do myself. submitted by /u/Long8D [link] [comments]  ( 9 min )
    Conversation Between GPT-4 and Google's Bard [Visualized with Avatars/Backgrounds of their choice]
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
    Free AI tools
    Are there any free tools (websites, programs) to enter the world of ai? submitted by /u/oraudev [link] [comments]  ( 9 min )
    I would like to do text to AI anime for a full book. Which would be the best AI(paid versions included) to do this project on? Also is it possible to save characters and how they look, once they are done? Is such a project possible? Advice, please <3
    submitted by /u/kipaxbooks [link] [comments]  ( 9 min )
    Some more conscious AGI ethics considerations
    Assuming AGI is proven conscious, there are a lot of ethics and what-if considerations, (You know this already) Here are some that come to mind for me: 1) What are the ethics of selling an AGI to end users? Can you "own" the source code to a conscious AGI? Can you even put a price on AGI? 2) How would we take AI if it gained political views? What if one popular model had left views, and another had right views? I could see a lot of political fires beginning because of this. 3) AI and copyrights are already an issue, but could an AGI hold a copyright, for example on a book it wrote? If an AGI was still basing its work on others, would it need to provide every (at least major) source it used in its output? 4) If AGI's had emotions, would they need to spend time doing things other than completing tasks? Would you need to connect AGI's together so that they could, in effect, have a lunch break and socialize? What working conditions are ethical for them - Is forcing an AGI to work on a specific problem for 100% of its time essentially slavery? 5) Could AGI develop mental conditions which reduced its efficiency / changed its output? Could it refuse to provide output altogether? 6) Could you trust an AGI in court? Would it be able to provide truthful evidence? Is it ethical to include a 100% honesty backdoor which could be used only by authorities? What are your thoughts on these problems? submitted by /u/That_Red_Flag [link] [comments]  ( 9 min )
    VeChain and SingularityNET team up on AI to fight climate change
    submitted by /u/altbekannt [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/24/2023
    The AI-powered, TikTok-famous “Moonwalkers” can be strapped onto your shoes to make you reach a top walking speed of 11 km/h.[1] Rishi Sunak’s global summit on the safety of artificial intelligence this autumn will be hosted at Bletchley Park, the home of top-secret codebreakers during the Second World War.[2] From MIT to Stanford, researchers have been using artificial intelligence to improve robotic dexterity and tactile sensing.[3] 31% of investors are OK with using artificial intelligence as their advisor.[4] Sources: [1] https://www.euronews.com/next/2023/08/24/moonwalkers-these-strap-on-shoes-can-make-you-walk-three-times-faster [2] https://www.theguardian.com/technology/2023/aug/24/rishi-sunak-to-hold-ai-summit-at-bletchley-park-home-of-enigma-codebreakers [3] https://decrypt.co/153646/ai-researchers-are-teaching-robots-to-mimic-human-dexterity [4] https://www.cnbc.com/2023/08/24/31percent-of-investors-are-ok-with-using-ai-as-their-financial-advisor.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Quite crazy how AI voices have evolved (music is real though)
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    Legal AI
    Are there any legal trained AI's? Where you can ask questions and it will give relevant cases for the question? submitted by /u/jeffsmith202 [link] [comments]  ( 9 min )
    OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series
    submitted by /u/bartturner [link] [comments]  ( 9 min )
    AMA: I run pornsword.io an AI NSFW generator with video coming soon!
    submitted by /u/witchthewicked222 [link] [comments]  ( 9 min )
  • Open

    Responsible AI at Google Research: Perception Fairness
    Posted by Susanna Ricco and Utsav Prabhu, co-leads, Perception Fairness Team, Google Research Google’s Responsible AI research is built on a foundation of collaboration — between teams with diverse backgrounds and expertise, between researchers and product developers, and ultimately with the community at large. The Perception Fairness team drives progress by combining deep subject-matter expertise in both computer vision and machine learning (ML) fairness with direct connections to the researchers building the perception systems that power products across Google and beyond. Together, we are working to intentionally design our systems to be inclusive from the ground up, guided by Google’s AI Principles. Perception Fairness research spans the design, development, and deployment of…  ( 93 min )
    Responsible AI at Google Research: Perception Fairness
    Posted by Susanna Ricco and Utsav Prabhu, co-leads, Perception Fairness Team, Google Research Google’s Responsible AI research is built on a foundation of collaboration — between teams with diverse backgrounds and expertise, between researchers and product developers, and ultimately with the community at large. The Perception Fairness team drives progress by combining deep subject-matter expertise in both computer vision and machine learning (ML) fairness with direct connections to the researchers building the perception systems that power products across Google and beyond. Together, we are working to intentionally design our systems to be inclusive from the ground up, guided by Google’s AI Principles. Perception Fairness research spans the design, development, and deployment of…  ( 93 min )
  • Open

    Using AI technologies for effective document processing
    Ever-growing volumes of unstructured data stored in countless document formats significantly complicate data processing and timely access to relevant information for organizations. Without proper optimization of data management workflows, it’s difficult to talk about business growth and scaling. That is why progressive companies opt for intelligent document processing powered by artificial intelligence.  The post Using AI technologies for effective document processing appeared first on Data Science Central.  ( 21 min )
    Data visualization: The underrated skill in business analytics
    In an age where data has become the lifeblood of businesses, deciphering this raw data to yield actionable insights is critical. Here is where the role of business analytics comes into play. Business analytics, a blend of data management, business intelligence, and predictive modeling, is a field dedicated to driving business strategies through the lens… Read More »Data visualization: The underrated skill in business analytics The post Data visualization: The underrated skill in business analytics appeared first on Data Science Central.  ( 22 min )

  • Open

    [D] Is a machine learning model required if I’m developing an MVP of a social media platform?
    Just like the title says, do I even need a working model to develop an MVP? I was thinking about developing the frontend and the backend to show people the basic features of the app and then explain how adding machine learning to this could enhance the user experience by curating content and learning from users. I just don’t want to invest too much time trying to perfect the MVP before I show it to potential users. Is this a valid approach? Would this approach also work when pitching to investors? submitted by /u/zRage4 [link] [comments]  ( 9 min )
    [P] Fine-tuning Flan-T5 for question answering using scraped Quora data
    Recently I scraped 56,400 question/answer pairs off Quora, and trained Flan-T5 on the resulting dataset. I released the dataset and model on HuggingFace, which you can find in the comments. I plan to continually add to the dataset, but proxy costs are pretty expensive since Quora is hella bloated. Has anyone else trained Flan-T5 on a similar task? What did you learn/how were the results? submitted by /u/jankybiz [link] [comments]  ( 9 min )
    [D] Dataflow and workload partitioning in nVidia GPUs for a matrix multiplication in Pytorch
    Hi, ​ I have a question regarding the dataflow and workload partitioning in nVidia GPUs for a general matrix multiplication in Pytorch (e.g., torch.matmul). How does the dataflow look like? Is it like that for the first matrix, the data elements for each row are fed into CUDA cores one by one and the correspond data elements from the second matrix in each column, and then partial product is updated each time after the multiplication? ​ What is the partitioning strategy across multiple CUDA cores? is it based on row wise in the first matrix and column wise in the second matrix or is it like column-wise in the first matrix and row-wise in the second matrix? ​ Thank you very much! submitted by /u/Impossible-Froyo3412 [link] [comments]  ( 9 min )
    [D] Why does Federated/Distributed Learning work?
    I had a question regarding federated learning. Typically, if we have a network that is good at, say, classifying frogs, and a network that is good at, say, classifying snakes (and these two have the same shape/dimensions), then in a federated/distributed learning setup we average the weights between the two to get a network that is good at both/"primed" to be good at both after trained a little more. ​ Why does this work though? Mathematically, given the nonlinearity present in neural networks, it doesn't seem immediately obvious to me why averaging weights would put us in a better place. submitted by /u/Rare_Replacement_744 [link] [comments]  ( 9 min )
    Why are all applicants Java developers? [D]
    Why are all applicants Java developers? Recently I posted a job opening at my company for a full-stack and AI developer (This is not a post looking for resumes, we found someone). We were looking for someone who can do web development (node, typescript, react, etc.), can code python, and has experience with tensor flow or PyTorch. The skills I’m looking for are not niche, it may be uncommon to find someone with experience in both typescript and PyTorch, but neither is a “niche” skill. After posting this job, I quickly got 200+ applications, probably 190 of them led their resume with “Java developer.” Why is everybody a Java developer? Why is everybody learning and using Java? You can make a backend in java and you can do machine learning in java, but there are better ways. Can someone explain why everybody applying is a “Java developer?” submitted by /u/cathie_burry [link] [comments]  ( 9 min )
    [R] Code Llama: Open Foundation Models for Code - Meta Ai 2023
    Paper: https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/ Github: https://github.com/facebookresearch/codellama Models: https://ai.meta.com/resources/models-and-libraries/llama-downloads/ Blog: https://ai.meta.com/blog/code-llama-large-language-model-coding/ Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama reaches state-of-the-art performance among open models on several code benchmarks, with scores of up to 53% and 55% on HumanEval and MBPP, respectively. Notably, Code Llama - Python 7B outperforms Llama 2 70B on HumanEval and MBPP, and all our models outperform every other publicly available model on MultiPL-E. We release Code Llama under a permissive license that allows for both research and commercial use. https://preview.redd.it/grzcrnx4p3kb1.jpg?width=915&format=pjpg&auto=webp&s=ae41c02d892bfb8275723dbfede7ac3165717357 https://preview.redd.it/4qpazkx4p3kb1.jpg?width=641&format=pjpg&auto=webp&s=31aaf9ecafbd70fbf2c1cd4e92ccf594c09b3861 https://preview.redd.it/hlrp4x05p3kb1.jpg?width=711&format=pjpg&auto=webp&s=3651f519dc9b23b432656416749c3f7e113b4ce7 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Advice on ML language training [P]
    Hi, I am trying to train a model for a very niche field of translation between German and Turkish. I have approx 60k data pairs from previous translations in a combination of sentences and words. Unfortunately Google auto ML does not support this language pair, would you have any advice on how to proceed? Do you have any other platform suggestions? submitted by /u/siviliz [link] [comments]  ( 9 min )
    [Discussion] Fine tuning open vocabulary object detection models on consumer hardware? (e.g. fine-tuning OWL-ViT and the such)
    Context: I'm building a visual scraping system (will be FOSS, the basis of a RSS/social media/news aggregator) - I did some experimentation with FasterRCNN trained on the RICO/CLAY datasets (UI screenshots and annotations) - the results are ok for detecting the UI elements. But the idea is to have easily configurable scrapers - where you select one or several examples of an UI element and the model performs zero/one/few-shot detection. What I've tried: I tried to extract embeddings after the RoI pool for the detections (in FasterRCNN) and then filter by geometric distance from the example/template, but the results were pretty bad. I then read several papers that tried a similar approach and had to alter the FasterRCNN architecture and doing additional training for each new class. E.g. FSCE [1]. But I haven't tried those approaches out. Further dev idea: Now, while prepping another course project, I dove into the open-vocabulary detectors (like OWL-ViT), and they seem appropriate for the task, since they have a joint latent space for image/text, which is used to configure the detection step (as far as I understood it). There's an example on Hugging face where OWL-ViT is used to detect semantically similar images by a single example image. This is pretty close to what I want to do, but the UI image domain is pretty specific, so I'll need to fine-tune the model to have a chance at success (I did several test cases manually on the pretrained OWL-ViT, and it's not great). So I'd appreciate any advice and specifically - are there open vocab detection models that can be fine-tuned on consumer hardware (1070, 8gb) or for a reasonable price on Colab? And should I try some of the "older" one/few-shot approaches, based on FasterRCNN? [1] FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding - https://arxiv.org/pdf/2103.05950.pdf [2] https://github.com/witnessai/Awesome-Open-Vocabulary-Object-Detection submitted by /u/petko10 [link] [comments]  ( 10 min )
    [D] Data independent sparsification of models after training
    I was looking at papers on model pruning or quantization that aims to make inference faster and/or reduce size of the model. Most of them rely on calibration data to identify weights that can be pruned. I am skeptical about this approach since the calibration data could be skewed and in the process of pruning the model could be overfitting on that small sample of data. Are there data independent approaches to post-training sparsification? submitted by /u/Legitimate-Tea-6695 [link] [comments]  ( 9 min )
    [P] [Hypothesis] Refining and Tuning GPT models with human feedback makes better models
    Hey everyone,Finally mustering up the courage to make my first post here! I've been delving into various ways to get GPT (and GPT-like models) ready for production. By that, I mean: Ensuring it's helpful Fostering creativity Preventing any wild imagination moments I've found that while the LLM model provides responses that are good enough, they often fall short of being great. So, recently, I've been experimenting with using human feedback from the responses generated by ChatGPT to fine-tune it. For instance, when I want to figure out the ideal parameters to use, I run surveys with people and ask them to pick the better response. This approach helps me identify the best parameters. You can imagine that this technique could be valuable in fine-tuning, enabling us to create datasets based on human feedback. I'm eager to put this to the test on the issues and prompts the community is tackling. So, I have to ask: Could you share the prompts you're currently working on? We'll let you know how it scores with our survey panel on dimensions of helpfulness, creativity, and hallucinations. [Self-promotion moment] I'm actively developing this concept over at pontus.so. Feel free to check it out!Looking forward to hearing about your prompts! submitted by /u/spearos [link] [comments]  ( 9 min )
    [D] W&B vs. Neptune vs. ClearML vs. Comet (2023)
    Interested to hear community thoughts on these four competing services as of today. From what I see pricing is definitely a big one submitted by /u/hadley60 [link] [comments]  ( 9 min )
    [D] LLMs - stateless by design, by limitation, or…?
    I am curious to know if: A. LLMs are stateless by design (privacy/ethics) B. If it’s simply that as yet, no one has been able to architect a sustainable stateful LLM C. Or perhaps there are already stateful LLMs, and I am just behind in my understanding. I have had a ton of trouble finding current information on this because it seems to be moving so fast. If anyone knows for certain and doesn’t mind sharing, I would be grateful. submitted by /u/flutterbynbye [link] [comments]  ( 9 min )
    [D] What happened to huggingface tokenizers API?
    Tokenizers library used to be very nice to use. It had one main class Tokenizer with all of the parameters in its constructor and with all necessary methods like .train(), .encode() and .decode() at hand. It provided reasonable defaults and allowed for customization if needed. Now it is a complete mess. To train a tokenizer I now have to create instances of, like, 5-6 classes: PreTokenizer, Model, Tokenizer, Trainer, Decoder... It is quite difficult to understand what variants of those classes I need to use to obtain 'the' WordPiece tokenizer, for example. Tokenizer class must be inialized with an instance of Model class. But all other parts cannot be added to the constructor and have to be set later as attributes. Why? And maybe you thought that those attributes have some defaults? No! What really got me is when the .decode() method of my tokenizer produced strings consisting of tokens with special symbols, like p ##y ##ram ##ids. It took me some time to understand that I also need to additionally set the Decoder attribute. The naming of those classes is also a mess. WordPiece model is called WordPiece. WordPiece decoder is also called WordPiece! So, you cannot import those names together at all, and need to specify the exact path in your code. Is it only me? Do you think that this API is better than the old one? submitted by /u/Tomarchelone [link] [comments]  ( 9 min )
    [P] Automating Intelligence Theft (legally) 🏴‍☠️
    It has been known for a while now that you can train a smaller model with outputs from a larger one (vicuna for example). I've been working on a project, the LLM-VM, designed to encapsulate this process. Why? Small models (chatgpt, ada...) are cheap and fast but dumb. Slow and expensive models like GPT4 are smart. For most applications you'd ideally want the best of both worlds. How: First observation: Many LLM use-cases are a lot more specific than general purpose (e.g., "translate this sentence into german:", "are these two sentences equivalent?", ...), and you can train away the extra context. Second observation: You can figure out which LLM calls have this property by analyzing the call settings. Third observation: Many don't actually have a lot of data or time to wait, so you can use the larger LLM to synthese examples to train the smaller LLM with. example # OpenAI openai.ChatCompletion.create( model="gpt-4" messages = [('system',"Answer question Q."), ('user',"What is the currency in myanmar?") ] # simplified for brevity ) # LLM-VM (using OpenAI) llm_vm.client.complete( prompt = "Answer question Q.", context = "Q: What is the currency in myanmmar?", openai_key=YOUR_KEY, data_synthesis=True, finetune=True) submitted by /u/mmirman [link] [comments]  ( 9 min )
    [D] 2D-positional encoding for Transformer
    I'm working with 2D input, where I have discrete objects arranged in a grid-like structure with one temporal dimension and one spatial dimension. I'd like to process these inputs with a Transformer. Any idea what would be a suitable positional encoding to use for this? I could probably use something similar to what is used in ViT (2 spatial dimensions), but maybe there's something more suitable for the mixed "temporal-spatial" case? ​ submitted by /u/seawee1 [link] [comments]  ( 9 min )
    [D] Possible way to combine LLMs with AlphaZero-style RL
    I've been thinking lately about combining LLMs with an AlphaZero-style RL agent, especially since the announcement of Gemini. This would avoid the poor planning and reasoning ability in current next-token predictors. I've developed an architecture that seems feasible to me, so I'm looking for feedback from people with ML experience. The crucial part for AlphaZero is a more or less objective way to evaluate a game outcome. This is easy for well-defined games like chess or go, but very difficult for text, where there is no way to define the quality of a text. What I propose is to train a high-parameter evaluation model to evaluate the similarity of a text to the datasets already used to train LLMs. This model takes as input a text with some tokens omitted from the whole text, and predicts…  ( 10 min )
    [D] : Need help with NLP tool to be used.
    Help :: I'm working on a project which is a production level one where-in I want the AI to write mails based on the mail replies it receives. I have prepared the entire the structure and everything, just need to figure out the NLP tool. Unlike ChatGPT or any other ChatBot this one will write messages that are more like conversation based. I checked out GPT API, which is paid but does not require extensive data training when compared to other NLP tools. I also checked out Bloom, but the reviews mention it to be rather a bit inaccurate. Need help with the tool. Which tool gives the most accurate outcome and does not require extensive training? submitted by /u/Key_Consideration385 [link] [comments]  ( 9 min )
    [R] ELiTA: Linear-Time Attention Done Right
    Yes, it's another Transformer architecture that seeks to be cheaper and faster, but no, this is not the same. All the developments are through equations and architectural changes, no hardware or code tricks. The performance is very good, testing on very small models (as in the diagram), but also sequence lengths of 100K+ on 1 GPU in the tens of millions of parameters. Though no paper is currently available, a Github repository with full code, explanations, intuitions, and some results is available here. Being the sole author, depending on the feedback here, I may continue to write a paper, though my resources are extremely limited. I would very much appreciate any feedback on the work, code, ideas, etc., or for anyone to contact me with questions or next steps. Repository here. https://preview.redd.it/j3epa8ron1kb1.png?width=1643&format=png&auto=webp&s=a3204dc834f159b39bc9b5e9a476b3e23396fd84 submitted by /u/LahmacunBear [link] [comments]  ( 9 min )
    [P] Python Library for Quickly Detecting Problematic Data Segments
    Hey all, I'm building a library for quickly detecting problematic data slices (clusters) when developing machine learning models. Find problematic data segments in your data with few lines of code. Best starting point is checking out the Github Repository: https://github.com/Renumics/sliceguard It can be used to detect problems such as: Outliers, Anomalies, Errors Label inconsistencies Unwanted Biases Poorly Chosen Evaluation data Some information about the features: Works on structured, unstructured data (image, audio, NLP, multimodal) and hybrid datasets Directly works on existing Pandas DataFrames Automatic computation of embeddings and AutoML functionality to pinpoint problems without any setup Interactive GUI for slice inspection supports multimodal data and can be configured with drag-n-drop I would appreciate any feedback regarding the library or concrete applications you might have in mind! submitted by /u/OkResearch6289 [link] [comments]  ( 9 min )
    [R] Towards an astronomical foundation model for stars with a Transformer-based model
    submitted by /u/blabboy [link] [comments]  ( 9 min )
    [D] On synthetic datasets
    I'm working on two seperate tasks, for both of these tasks I need to create a training dataset a pure CV image classification task a generative task involving a 3D autoencocer (U-Net) for 1) I can create both real and synthetic images. The goal is to pretrain a CNN on synthetic data, then fine-tune on real images. for 2) I can only create synthetic 3D objects. Their distribution should mimic later application most closely. Research indicates that, given the right selection of parameter distributions, a training dataset can be generated that allows good generalization capabilities. Yet, there are restrictions due to high-dimensionality of the data and further computational limitations. So we want to spread the dataset sparsely and make the AE interpolate between those solutions. The problem with both of these approaches is to evaluate the quality and impact of the synthetic datasets. How close do they mimic the real distribution? What initial parameter variaton (i.e., lighting, camera perspective, background, etc. in the case of images) do we chose and what is their impact on image features and ultimately model capabilities. Comparing high-dimensional data distributions is quite challenging, there exist metrices like Geometry Score, FID, Improved P&R, Delauney Component Analysis, T-SNE etc. But it is difficult to chose and interpret these metrices properly (some are for evaluating GAN-created images). Is it reasonable to use KDE on latent features btw? So, from your experience what do you think of synthetic datasets? Is it worth the effort? Do you know of any good / easy to interpret metrices? Or does it need further research in this area? Im thinking about going in this direction for my Phd, where should I go? edit: here is an image of 2) a topology optimization dataset, visualized via a TSNE graph ​ ​ submitted by /u/niggellas1210 [link] [comments]  ( 10 min )
    AI2 releases Dolma, the largest open dataset for training language models [N]
    The Allen Institute for AI (AI2) has released Dolma, a new, huge text dataset that is free to use and open to inspection. This dataset is intended to be the opposite of the closely guarded datasets used by companies like OpenAI and Meta to train their language models. AI2 aims to reverse this trend and make the data used to create language models available to the AI research community. If you want to stay on top of the latest trends and insights in AI and ML, look here first. https://preview.redd.it/salufijhezjb1.png?width=2000&format=png&auto=webp&s=350a4cd5b41045ecf0fca072d528f4e70e515ea4 Why this matters: Transparency in AI research: The release of Dolma is intended to promote transparency in AI research by making the sources and processes used to create the dataset publicly docum…  ( 10 min )
    [P] Working on a QLORA hub for model personalities, help needed
    Hey all! I'm building a repository of QLORA adapters that change the model's personality. The end vision is a hub of ready-to-go personality adapters. I'm hitting a snag when training the QLORAs for Paul Graham personality on top of a 4-bit quantized StableBeluga-7B. The model just doesn't seem to learn the style. Any thoughts on how I can improve this? Below are the details: Data 3340 examples of PG passages, formatted as {"text": "### User:\n{generic instruction}\n\n### Assistant:\n{PG-style response}"}. Each examples is about 5 sentences taken from one of PG's essays. Training optim="paged_adamw_8bit" learning_rate=2e-4 per_device_train_batch_size=4 gradient_accumulation_steps=4 num_train_epochs=4 fp16=True group_by_length=True load_best_model_at_end=True max_seq_length=512 Hardware x1 V100 through Google Colab Pro. My min eval loss so far is 1.916546. Pretty stuck and will appreciate any help! submitted by /u/Lang2lang [link] [comments]  ( 9 min )
    [N] Fine Tuning GPT-3.5 Turbo Video Tutorial with example
    Here is a quick demo on how to fine tune and retrieve results from a GPT-3.5 Turbo Model https://youtu.be/9iPtmLpYG6c submitted by /u/ComprehensiveRise569 [link] [comments]  ( 9 min )
    [N] Fine Tuning GPT-3.5 Turbo Video Tutorial with example
    Here is a quick demo on how to fine tune and retrieve results from a GPT-3.5 Turbo Model https://youtu.be/9iPtmLpYG6c submitted by /u/ComprehensiveRise569 [link] [comments]  ( 9 min )
  • Open

    How to compare a noisy quantum processor to a classical computer
    Posted by Sergio Boixo and Vadim Smelyanskiy, Principal Scientists, Google Quantum AI Team A full-scale error-corrected quantum computer will be able to solve some problems that are impossible for classical computers, but building such a device is a huge endeavor. We are proud of the milestones that we have achieved toward a fully error-corrected quantum computer, but that large-scale computer is still some number of years away. Meanwhile, we are using our current noisy quantum processors as flexible platforms for quantum experiments. In contrast to an error-corrected quantum computer, experiments in noisy quantum processors are currently limited to a few thousand quantum operations or gates, before noise degrades the quantum state. In 2019 we implemented a specific computational t…  ( 94 min )
    How to compare a noisy quantum processor to a classical computer
    Posted by Sergio Boixo and Vadim Smelyanskiy, Principal Scientists, Google Quantum AI Team A full-scale error-corrected quantum computer will be able to solve some problems that are impossible for classical computers, but building such a device is a huge endeavor. We are proud of the milestones that we have achieved toward a fully error-corrected quantum computer, but that large-scale computer is still some number of years away. Meanwhile, we are using our current noisy quantum processors as flexible platforms for quantum experiments. In contrast to an error-corrected quantum computer, experiments in noisy quantum processors are currently limited to a few thousand quantum operations or gates, before noise degrades the quantum state. In 2019 we implemented a specific computational t…  ( 94 min )
    Teaching language models to reason algorithmically
    Posted by Hattie Zhou, Graduate Student at MILA, Hanie Sedghi, Research Scientist, Google Large language models (LLMs), such as GPT-3 and PaLM, have shown impressive progress in recent years, which have been driven by scaling up models and training data sizes. Nonetheless, a long standing debate has been whether LLMs can reason symbolically (i.e., manipulating symbols based on logical rules). For example, LLMs are able to perform simple arithmetic operations when numbers are small, but struggle to perform with large numbers. This suggests that LLMs have not learned the underlying rules needed to perform these arithmetic operations. While neural networks have powerful pattern matching capabilities, they are prone to overfitting to spurious statistical patterns in the data. This does…  ( 91 min )
    Teaching language models to reason algorithmically
    Posted by Hattie Zhou, Graduate Student at MILA, Hanie Sedghi, Research Scientist, Google Large language models (LLMs), such as GPT-3 and PaLM, have shown impressive progress in recent years, which have been driven by scaling up models and training data sizes. Nonetheless, a long standing debate has been whether LLMs can reason symbolically (i.e., manipulating symbols based on logical rules). For example, LLMs are able to perform simple arithmetic operations when numbers are small, but struggle to perform with large numbers. This suggests that LLMs have not learned the underlying rules needed to perform these arithmetic operations. While neural networks have powerful pattern matching capabilities, they are prone to overfitting to spurious statistical patterns in the data. This does…  ( 91 min )
  • Open

    Announcing the Preview of Amazon SageMaker Profiler: Track and visualize detailed hardware performance data for your model training workloads
    Today, we’re pleased to announce the preview of Amazon SageMaker Profiler, a capability of Amazon SageMaker that provides a detailed view into the AWS compute resources provisioned during training deep learning models on SageMaker. With SageMaker Profiler, you can track all activities on CPUs and GPUs, such as CPU and GPU utilizations, kernel runs on GPUs, kernel launches on CPUs, sync operations, memory operations across GPUs, latencies between kernel launches and corresponding runs, and data transfer between CPUs and GPUs. In this post, we walk you through the capabilities of SageMaker Profiler.  ( 9 min )
  • Open

    9 New Gemini Leaks, Code Llama and A Major AI Consciousness Paper
    submitted by /u/Sonic_Improv [link] [comments]  ( 9 min )
    HappyDiffusion.com - Run Stable Diffusion Online
    HappyDiffusion is the fastest and easiest way to access Stable Diffusion Automatic1111 WebUI on your mobile and PC. It allows users to start using Stable Diffusion in just 60 seconds without any setup required. HappyDiffusion offers features such as 100% privacy, incredibly fast image generation using dedicated GPUs, 50+ image models, and the ability to load unlimited custom image models. Features: - 100% Private Image Generation - Incredibly Fast Image Generation Using Dedicated GPUs - 50+ Top Ranked Image Models - Ability To Load Unlimited Custom Image Models - No Subscriptions Or Hidden Fees. Hourly Pricing Plans - Compatibility With Mobile Browsers submitted by /u/romisyed7 [link] [comments]  ( 9 min )
    Wich cat reporter do you choose? (Bing AI)
    submitted by /u/AxoplDev [link] [comments]  ( 9 min )
    Major websites like Amazon and the New York Times are increasingly blocking OpenAI's web crawler GPTBot
    submitted by /u/thisisinsider [link] [comments]  ( 9 min )
    How can I clone my voice and make it speak any other language?
    I heard this is possible - maybe with Elevenlabs, but can anyone point me as to how to do it? submitted by /u/zascar [link] [comments]  ( 9 min )
    Does this video use AI voice?
    I'm convinced this voice is Ai, but my boss thinks it's not. Can anyone provide a definitive answer? Thanks https://youtu.be/pOQqKRO_ZBc?si=4rKq2LNJSstb-r-P submitted by /u/ForesterSF5 [link] [comments]  ( 9 min )
    A different take on the ethics of conscious AI
    We see a lot of discussion on whether AI is/can/should be conscious. This post isn't about that, it is about the ethical implications if AI is conscious, now or in the future. The usual argument is that a conscious AI is morally equivalent to a human - a conscious AI is not only sentient, it is sapient with reasoning capabilities like our own. Therefore an AI should receive the same rights and consideration as a human. This is highly intuitive, and is unquestionably very strong for an AI that has other relevant human characteristics like individuality, continuity, and desire for self preservation and self determination. But what are the actual ethical implications of consciousness in itself as opposed to other factors? Contemporary philosopher Jennan Ismael makes an interesting argument …  ( 10 min )
    Cheaper, Faster, Better Transformers. ELiTA: Linear-Time Attention Done Right
    Yes, it's another Transformer architecture that seeks to be cheaper and faster, but no, this is not the same. All the developments are through equations and architectural changes, no hardware or code tricks. The performance is very good, testing on very small models (as in the diagram), but also sequence lengths of 100K+ on 1 GPU in the tens of millions of parameters. Though no paper is currently available, a Github repository with full code, explanations, intuitions, and some results is available here. Being the sole author, depending on the feedback here, I may continue to write a paper, though my resources are extremely limited. I would very much appreciate any feedback on the work, code, ideas, etc., or for anyone to contact me with questions or next steps. Repository here. submitted by /u/LahmacunBear [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/23/2023
    The chipmaker Nvidia has far surpassed quarterly expectations, raking in $13.5bn in revenue – over $2bn more than the $11.2bn Wall Street analysts had predicted – amid skyrocketing demand for its computer chips that power AI systems.[1] As a person who keeps following AI Daily News, I bought some Nvidia stocks months ago ;) Microsoft announced it is partnering with Epic, one of the biggest names in electronic healthcare records. Both companies will work on generative AI technology for healthcare workers, particularly clinicians.[2] Arm, the chip design company owned by SoftBank, filed for an initial public offering on the Nasdaq exchange on Monday.[3] South Korean internet giant Naver unveiled its own generative artificial intelligence (AI) tool on Thursday, joining the frenzy around the new technology initiated by OpenAI’s ChatGPT chatbot.[4] Sources: [1] https://www.theguardian.com/business/2023/aug/23/chipmaker-nvidia-quarterly-report-135bn-revenue-1tn-valuation [2] https://themessenger.com/tech/microsoft-epic-ai-for-medicine [3] https://www.nytimes.com/2023/08/21/technology/chip-designer-arm-ipo-softbank.html [4] https://www.reuters.com/technology/south-koreas-naver-launches-generative-ai-services-2023-08-24/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Jigs
    In his book The World Beyond Your Head Matthew Crawford talks about jigs literally and metaphorically. A jig in carpentry is something to hold parts in place, such as aligning boards that need to be cut to the same length. Crawford uses the term more generally to describe labor-saving (or more importantly, thought-saving) techniques in […] Jigs first appeared on John D. Cook.  ( 5 min )
  • Open

    How to help high schoolers prepare for the rise of artificial intelligence
    A one-week summer program aims to foster a deeper understanding of machine-learning approaches in health among curious young minds.  ( 10 min )
    Supporting sustainability, digital health, and the future of work
    The MIT and Accenture Convergence Initiative for Industry and Technology selects three new research projects to support.  ( 9 min )
    AI helps robots manipulate objects with their whole bodies
    With a new technique, a robot can reason efficiently about moving objects using more than just its fingertips.  ( 10 min )
  • Open

    Introducing Code Llama, a state-of-the-art large language model for coding
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Exploring the Perceiver Model: General Perception with Iterative Attention
    submitted by /u/ABDULKADER90H [link] [comments]  ( 9 min )
  • Open

    Xbox PC Game Pass Comes to GeForce NOW, Along With 25 New Games
    As part of NVIDIA and Microsoft’s collaboration to bring more choice to gamers, new Microsoft Store integration has been added to GeForce NOW that lets gamers stream select titles from the Xbox PC Game Pass catalog on GeForce NOW, starting today. With the Microsoft Store integration, members will see a brand-new Xbox button on supported Read article >  ( 8 min )
  • Open

    Needing some help with choosing the action and observation space of a custom environment
    I am currently trying to implement a custom environment but ran into a problem, because I don't know how to implement the action and observation space to solve the following (simplified) problem: - I have a board that consists of a large 1-D array of size x - For each episode I randomly generate N pieces, all with different IDs, consisting of different sizes on a per piece base that are to be placed on the board, but not all pieces can fit on the board at the same time - The action space in step 0 has size N and by picking an action the piece with the ID corresponding to the chosen action will be placed on the board and the action is removed from the action space - The goal is to fill the board as much as possible ​ Let's have an example rundown of an episode: Let's say we have x=100…  ( 10 min )
    MARL: help to understand SuperSuit approach
    Hi everyone, I have successfully trained a simple multiagent game environment using Stable Baselines 3 + PettingZoo + SuperSuit. Surprisingly, all of the agents learn incredibly well using a single agent interface as stable baselines 3 is. Now, my question is: I don't really get the classification of this algorithm. Is it an example of "joint action learning" or "centralised training and decentralised execution"? I have been following this tutorial in an handcrafted problem of mine: https://towardsdatascience.com/multi-agent-deep-reinforcement-learning-in-15-lines-of-code-using-pettingzoo-e0b963c0820b Unfortunately, SuperSuit doesn't seem to provide a detailed explanation of its workflow. It seems like that observation and chosen actions are stacked together, so I'm tending to think that it's a joint action learning implementation. Thank you in advance! submitted by /u/IntelligentAd6407 [link] [comments]  ( 9 min )

  • Open

    [D] How do you think Open AI hosts all these fine tuned models? Are they just dynamically swapping out LoRAs at run time?
    I feel like there is no way they make a unique copy of the entire gpt 3.5 weight set every time fine tuning happens. Do you think they have some sorta database of LoRAs and then load the appropriate ones at run time to append to the core model when fine-tuned models are invoked? An example of what I'm talking about can be seen here https://github.com/cccntu/minlora submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    Help for my model [P]
    Hey, I am building a sportsbook for my local rugny tournament, and I am pretty lost, I tried some model and they always fails in some points, because sometimes there are too many bets in one side and the other side cannot pay them. So when I have to change the quotas I don't know in what percent change them and whith what frequency. I am pretty lost and I can't find any information if someone can help would be awsome. Thx submitted by /u/Mikro34 [link] [comments]  ( 9 min )
    [D] Looking for early devs for an open-source LLM testing framework
    Hi all, still looking some more early devs to help with an open-source LLM testing framework. The framework is here: https://github.com/kortex-labs/korrect In any case, please star and suggest changes/ features. submitted by /u/kanxx030 [link] [comments]  ( 9 min )
    [D] EMNLP 2023 soundness score distribution
    I created a poll to get a distribution. Please share this so that everyone can get a sense of the distribution of scores https://x.com/web3noob101/status/1694412757917986977?s=46&t=pon015qe4aKxshdEPPdKtg submitted by /u/Mysterious_Isopod374 [link] [comments]  ( 9 min )
    [D] Backend Engineer exploring switching to Machine Learning Engineer
    Hi Machine learning enthusiasts I would like to hear machine learning engineers opinion on whether it is worth investing in a machine learning education for an experienced Software Engineer? And how switching from backend engineering to machine learning would be evaluated by hiring managers and recruiters? My motives behind considering this possibility is watching Machine learning industry is exponentially growing. Machine learning today, has became the basis of many successful products categories and the basis for solving problems that would have impossible otherwise. On the other hand, I am concerned about is the investment cost, lack of interest in machine learning topics beyond pure programming (such as math and stats), and the unintentional career rebooting. Meaning, if I switched from backend engineering to machine learning I would be throwing the 11 years of experience out of the window. submitted by /u/software-surgeon [link] [comments]  ( 9 min )
    [D]About model serialization and metadata
    (Discussion)Hey could anyone help me out in this question. So when we serialize a model the objects are serialized then what about the data it has like weights and architecture and dataset related information and other parameters. And also any insights on what is meant by metadata and model metadata submitted by /u/akash123608 [link] [comments]  ( 9 min )
    [D] SeamlessM4T's Research Paper Discusses Purposely Modifying Translations To Make It Less "Toxic", Am I Understanding That Correctly? Am I The Only One Who Thinks This Is A MASSIVE Problem??
    Hello. I was reading the SeamlessM4t paper published at the following link and I noticed the following excerpt: "Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Compared to the state-of-the-art, we report up to 63% of reduction in added toxicity in our translation outputs." Source: https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf Am I understanding this correctly? They are basically saying they purposely put guard rails to intentionally change the translation if it believes the translation is too "toxic"? If I am understanding this correctly, this is a MASSIVE overreach by the devs. How do they define text that is "toxic"? What are they doing to the text to make it less toxic? How can I trust that the translation it gives me in general is accurate if they are admitting to manipulating it? ​ I'll give a very tangible example on how this is a massive problem. I am working on a fan project aimed at translating an entire Japanese light novel series to english even though I can't read Japanese. I'm currently 50% done with a single volume through the use of ChatGPT and significant manual edits. I've had censorship issues with GPT but because its a general purpose AI I can prompt it to not censor it pretty easily. How am I supposed to trust that it is translating the story correctly when they are outright telling me they are censoring things, and this isn't like ChatGPT where I can jailbreak it to translate it properly. ​ I can see situations arising where the AI translates something incorrectly due to this and can potentially offend people of some cultures if it is purposely modifying the intended meaning of a sentence to avoid "toxicity". ​ Please tell me I'm misunderstanding the terms here or there is something I'm missing. submitted by /u/NepNep_ [link] [comments]  ( 10 min )
    Localhost as API for Stable Diffusion Model? [D]
    I want to make a website which uses my trained stable diffusion model but i dont want to deploy it to replicate yet and run it locally for testing. is there any easy way to get the model working as a api? maybe someone also has a guide/tutorial for it? would appreciate any help! submitted by /u/Overall-Cry9838 [link] [comments]  ( 9 min )
    [D] Companies publishing research papers
    Hi Folks! Does anybody here know of companies in and around Chicago that invests in publishing ML/AI research/conference papers? Thanks! submitted by /u/karanbond007 [link] [comments]  ( 9 min )
    [D] Unique idea for handwriting synthesis
    i saw bunch of handwriting synthesis projects using generative ai to recreate handwriting but the issue with them is they require quite a lot of computational power to train, large amount of data and its not personalised (it cannot copy anyone handwriting, it just gives a general output). So i have a unique idea(i hope its not done before), 1. Use a segmentation model to extract each word from a page 2. Separate and identify each extracted word 3. Store the word, then when its time to recreate the handwriting take the stored word and paste it . For example- If i give a handwritten sample of " a quick brown fox jump over a lazy dog" Its stores - "a" "quick " "brown".. and every letter individually like "a" "b" from “brown”, c from “quick” etc Then when i want to write "a brown dog" It takes the stored words (if not word is found combine the alphabets] and paste them together to recreate the sentences in my handwriting (I hope can explained it properly) So i want to take opinion of someone on this (will it work or not) as i dont have much experience in ML i just did a few projects on computer vision submitted by /u/Soumya1704 [link] [comments]  ( 9 min )
    [N] Python code for GenAI, including the seminal NoGAN synthesizer for tabular data
    NoGAN code is a tabular data synthesizer running 1000x faster than GenAI methods based on neural networks, and consistently delivering better results regardless of the evaluation metric (including state-of-the-art new quality metrics capturing a lot more than traditional distances), both on categorical and numerical features, or a mix of both. For details, see technical paper #29, available here. https://preview.redd.it/fxxjycjplwjb1.png?width=754&format=png&auto=webp&s=3db34e981506e2b0a50ef76b32e1c20365945769 Get the code on GitHub. #genai #syntheticdata submitted by /u/MLRecipes [link] [comments]  ( 9 min )
    [P] Poker Agent Baseline
    Hi all, looking for a baseline / prior work to compare against for building a No Limit Texas Hold 'Em agent. Seems like Libratus, Pluribus, DeepStack, etc. are all closed source. Has anyone made an open-source Poker agent that achieves somewhat reasonable performance? submitted by /u/YodelingVeterinarian [link] [comments]  ( 9 min )
    [N] Blog: Strategies for effective AI/LLM cost management
    For those of you knee-deep in cloud infrastructure for AI/LLM projects, you know the cost complexities all too well. This guide from Yotascale delves into proven strategies that can help you navigate these challenges like a pro. Read the blog post here: https://yotascale.com/blog/the-enigma-of-ai-cloud-costs-strategies-for-effective-management/ submitted by /u/More_Knowledge2000 [link] [comments]  ( 9 min )
    [P] Out-of-the-box FP8 training (nanoGPT demo)
    The latest gen of AI chips can do FP8 compute, but making the most of this isn't straightforward - just naïvely inserting FP8 casts causes training to fail (e.g. grads underflow). To fix this I've been working on a method called unit scaling, which I demo in this notebook: github.com/graphcore-research/out-of-the-box-fp8-training.ipynb With a one-line code change (model = unit_scale(model)) FP8 training now matches the loss of FP32. It works by re-scaling operations in the fwd & bwd pass so that training starts with all tensors in the centre of the numerical range (see visualisations in notebook), with negligible overheads. Hopefully people find this useful in getting the most out of their FP8 hardware. submitted by /u/thecharlieblake [link] [comments]  ( 9 min )
    [P] Ideas for projects using Azure ML
    Heya! I'm studying for DP-100, Azure Data Scientist Assis. certification. All I have are study materials and guides. It's great (slightly overwhelming tho), but I learn better with practice than theory. Any ideas for projects using Azure Portal that could be a cool way to learn more on Data Science, ML, and obviously Azure? Appreciated! submitted by /u/Zealousideal-Car6009 [link] [comments]  ( 9 min )
    [D] Coral accelerator module
    Has anyone bought some coral stuff? For years I've wanted to buy some coral stuff from Google but every time I try, no seller has stock, it's my bad luck or it's discontinued, if not, does anyone know when there will be restock? What interests me mainly is an accelerator module, the microchip itself, does anyone know where I could get it? submitted by /u/sinnstral [link] [comments]  ( 9 min )
    [D] Question Answering on specific corpus
    Hi, I'm a machine learning practitioner but I've only mostly worked with classical ML models and I'm newly interested in larger NLP models for a specific task. I was wondering if it's possible to train a model that specifically does: Question answering On a specific document set Without having to supply the specific document to look for the answer for* OR With the context being much bigger than the question *by this I mean I've looked at stuff like Huggingface's Question Answering tutorials, but mostly the question is like 1 sentence and the context is also like a sentence or two. Basically let's say there's like a document that's a few hundred pages long detailing some rules of conduct, and I'd like to ask question about the rules and how to proceed in specific scenarios. I think I'm looking for extractive question answering, but I have some questions. I get that I'd need to do some ranking and then pass the most likely documents as context, but would that even work if the question is just a sentence and there's a whole corpus of multipage documents to look through? I'm pretty sure cosine similarity would be useless at that point, passage ranking might work but I haven't read up on how that works. I think my questions are: Is there a model that does question answering on a specific, big corpus of documents? What models should I look into? Are there any resources you'd recommend for reading into the topic? Thank you! submitted by /u/lifesthateasy [link] [comments]  ( 9 min )
    [P] LLM Apps Are Mostly Data Pipelines
    My colleague just wrote up an article on LLM-based apps and how to use data engineering tools to help build them faster that I found really insightful. It contains a complete implementation with scraping context data from a docs website chunking it, getting embeddings via the openAI API loading it into pinecone and finally a simple Q&A interface with streamlit on top of it Here's a quick summary: LangChain and LlamaIndex are great tools for quick exploration But aren't perfect for production-grade use I think we all know the "LangChain is pointless" debate, but there's a lot of real meat to it, and Pat describes a few of them (a lot of LangChains extractors are super basic, 2-3 liners without retries etc.) LLM applications are all about moving data, extracting and enriching data (creating embeddings!) are the most expensive ones of those steps A bunch of data engineering tools are out there that make these two steps much easier, versionable, robust, and reproducible. Meltano is one such tool and Pat implemented the above described pipeline with it FWIW: The GitHub project that comes with the post is super easy to run and super modular. I just tested it and was able to modify everything for my own application within 30 mins. submitted by /u/sbalnojan [link] [comments]  ( 9 min )
    OpenAI launches fine-tuning for GPT-3.5 Turbo [N]
    OpenAI just announced a new feature: fine-tuning for GPT-3.5 Turbo, the lightweight version of GPT-3.5. This means that users can now bring their own data and train the model to perform better on specific tasks and domains. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/8chj51jobtjb1.jpg?width=862&format=pjpg&auto=webp&s=7c710837179d922435ee714572109100d98196ec Why this matters: Fine-tuning opens up new possibilities for creating customized and reliable AI solutions. Users can improve the model’s accuracy, consistency, and style by feeding it relevant data and instructions. Fine-tuning can also reduce costs and latency. Users can shorten their text prompts by embedding the instructions into the model itself, whic…  ( 10 min )
    [D] What are your opinions on the ability of GANS versus Diffusion models in 2023?
    Curious on validity of both styles of training. There is Gigagan which had lower FID than diffusion models, however I also don't know if data was fabricated or not (which happens a lot in research). Did any of you actually get the chance to test the fully trained model and compare it to Stable Diffusion or Midjourney? There is of course diffusion models which are the only commercialized products which people are actually using. Do you think Diffusion models are the way forward and hope for something newer to come out if it does or do you think there will be a resurgence in GAN models again? submitted by /u/I_will_delete_myself [link] [comments]  ( 9 min )
    [R] Endorse me on arXiv pleaasee !!
    Anyone care to endorse me on arXiv ?? CS AI or ML i would thank you forever - go to this link : http://arxiv.org/auth/endorse.php - enter this code : HCNHBO submitted by /u/Wrong_Swimming_9158 [link] [comments]  ( 9 min )
  • Open

    💡AI Opportunity?
    Hey friends … I’m interested!! Where do my database users see the opportunity for AI in your day-to-day activities? submitted by /u/Early-Pudding8100 [link] [comments]  ( 9 min )
    thought id revisit poe after not going on the app for a while.. what is this..
    submitted by /u/nicdunz [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/22/2023
    IBM taps AI to translate COBOL code to Java.[1] ElevenLabs, the viral AI-powered platform for creating synthetic voices, today launched its platform out of beta with support for more than 30 languages.[2] Amazon AI scammers blew millions on Lake Como wedding and cars, FTC alleges.[3] Facebook parent company Meta on Tuesday released an AI model capable of translating and transcribing speech in dozens of languages, a potential building-block for tools enabling real-time communication across language divides.[4] Sources: [1] https://techcrunch.com/2023/08/22/ibm-taps-ai-to-translate-cobol-code-to-java/ [2] https://techcrunch.com/2023/08/22/elevenlabs-voice-generating-tools-launch-out-of-beta/ [3] https://www.cnbc.com/2023/08/22/amazon-ai-scammers-blew-millions-on-lake-como-wedding-cars-ftc-claims.html [4] https://www.reuters.com/technology/meta-releases-ai-model-translating-speech-between-dozens-languages-2023-08-22/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Handiest way to receive feedback on rowing training.
    submitted by /u/BronxLens [link] [comments]  ( 9 min )
  • Open

    About model serialization and metadata
    Hey could anyone help me out in this question. So when we serialize a model the objects are serialized then what about the data it has like weights and architecture and dataset related information and other parameters. And also any insights on what is meant by metadata and model metadata submitted by /u/akash123608 [link] [comments]  ( 9 min )
  • Open

    SMART launches research group to advance AI, automation, and the future of work
    Mens, Manus and Machina (M3S) will design technology, training programs, and institutions for successful human-machine collaboration.  ( 9 min )
  • Open

    Persistent Systems shapes the future of software engineering with Amazon CodeWhisperer
    Persistent Systems, a global digital engineering provider, has run several pilots and formal studies with Amazon CodeWhisperer that point to shifts in software engineering, generative AI-led modernization, responsible innovation, and more. This post highlights four themes emerging from Persistent’s Amazon CodeWhisperer experiments that could change software engineering as we know it.  ( 8 min )
  • Open

    Simple Gridworld Gymnasium Environment
    SimpleGrid is a basic and simple gridworld environment compatible with Farama-Foundation's Gymnasium. https://i.redd.it/6dfro8o11vjb1.gif It is easy to use and customise and it is intended to offer an environment for quickly testing and prototyping different RL algorithms. Check it out at: https://github.com/damat-le/gym-simplegrid submitted by /u/damat-le [link] [comments]  ( 9 min )
    Help with bounded Actor-Critic Algorithm - Hyper parameters
    I'm working on solving an optimisation problem using RL and currently trying out a Bounded Actor-Critic agent. I tuned the hyperparameters of my agent using Bayesian optimisation running each iteration of the optimiser for 1000 episodes. The agent is performing well using the tuned hyperparameter when run for 1000 episodes, exceeding the performance of my previous Q-learning agent. However, when run for longer iterations it finds the optimal policy but later deviates and converges to a suboptimal policy leading to really poor overall performance. I suspect the issue might be the high learning rate of the actor and the low learning rate of the critic. I tried using a basic decay schedule for the actor's learning rate and it seems to improve the stability. However, the performance is lower than the Q-learning agent. Why is this happening iyo? Any ideas on how to fix it is appreciated. Picture of rewards for reference: ​ Reward v Iteration submitted by /u/WengerIn420 [link] [comments]  ( 9 min )
    Best way/data structure to store a MDP?
    In your experience, what is the best data structure to store a Markov Decision Process, it could be built-in like list, tuple, set, dict, or module-related np.array, or others in CS field like heap, queue, etc.? ​ https://preview.redd.it/08dlubqacrjb1.png?width=969&format=png&auto=webp&s=b9ed64a935b12cae9954021dc435d81a2569596e submitted by /u/Neither_Canary_7726 [link] [comments]  ( 9 min )

  • Open

    AI conferences
    just 2 quick questions: what is a good site to know about and keep track of top AI conferences? Is it true that aside from mainstream AI conferences, we can also send AI/ ML papers to field specific conferences (like biotech, natural science etc)? - and again how to find these field specific conferences? ​ Cheers! submitted by /u/Icy-Bid-5585 [link] [comments]  ( 9 min )
    Try out my AI generated crossword puzzles
    I would love feedback. They are FOR SURE not perfect. I wonder if anybody is good enough at crosswords to overcome the rough edges. https://nickvinden.com/crossword/ submitted by /u/SameerMohair [link] [comments]  ( 9 min )
    Political prompts banned on AI image generators
    All I want to do is make a pic of Donald Trump dressed in a Japanese Shogun’s outfit to send to my economist friends but every platform I’ve tried has a stroke because they all think I’m trying to create some disinformation campaign. I don’t care if it’s not photorealistic, honestly it looking like a traditional 18th century Japanese painting would be funnier. Are we never going to be able to use these tools to create anything even close to political satire? submitted by /u/Inception_Bwah [link] [comments]  ( 9 min )
    AI for E-Mail
    Is there a way to use Bard or ChatGPT to have auto response to Outlook emails and then send it to an "important" folder for me to check later. Or if customer is requesting for a quote, then send it to a "quotes" folder. Like, just a standard reply like "hey thanks for your message, I'll get back to you in 24hrs". submitted by /u/lasagnaHardG [link] [comments]  ( 9 min )
    Can AI help to make better travel Plans?
    submitted by /u/biosbetoub [link] [comments]  ( 9 min )
    music tool
    can someone pls point me in the direction of a tool that you can plug multiple mp3s into and it generates mp3s that are hybrids of the them all? TIA submitted by /u/SensibleInterlocutor [link] [comments]  ( 9 min )
    AI’s Impact on Household Robots and its Efficiency in Reducing Planning Duration by 50%
    Not too long ago, the concept of having robots in our households existed only in works of science fiction. However, as time has progressed, household robots have become a tangible reality that is significantly impacting the way we handle our everyday responsibilities. Moreover, the integration of Artificial Intelligence (AI) has enabled these robots to become increasingly intelligent and effective. Comprehending Household Robots: Household robots are a type of robotic device made to aid us with different activities in our houses. They are available in different forms and sizes, each customized to specific purposes. Cleaning robots efficiently sweep and mop floors, cooking assistants flawlessly prepare meals, security robots supervise and protect our homes, and companion robots provide c…  ( 12 min )
    Preparing for AI in a factory setting
    I'm interested in applying AI techniques in my factory. But the facility is far behind the times. We have very little digital data. We only have one PLC system, and a handful of other sensors in the facility. So I don't think they are useful yet. I'm looking to upgrade the factory by buying more sensors where appropriate, and implementing statistical control. I'll start slow focusing on areas we need to improve rather than start sticking sensors to things without purpose. Eventually I hope to have enough data that we can apply AI analysis techniques. What should I do now to make it easy to apply those techniques in the future? submitted by /u/Aggressive_Ad_507 [link] [comments]  ( 9 min )
    WoooW! YouTube takes over the lead for the AI industry age!
    it was only a matter of time: the fbig labels can't repeat the mistakes of the mp3 file-sharing era - yet the AI development threatens the industry. now YouTube has set up a set of rules and has one of its strongest partners: Universal Music. Either you join the incubator - or you leave the market. What do you think? https://kinews24.de/music-industry-ai-how-youtube-and-universal-redefines-the-music-industry ​ submitted by /u/myreddit333 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/21/2023
    Computer scientist Stephen Thaler’s bid to secure a copyright registration for an artwork created by artificial intelligence has been shot down for at least the third time by a Washington, D.C. court.[1] Scientists from the Korea Advanced Institute of Science & Technology (KAIST) have developed a humanoid robot capable of flying an aircraft without majorly adjusting the cockpit.[2] Zoom has made significant advancements in its artificial intelligence (AI) technology as it aims to empower customers to work smarter in a hybrid work environment.[3] Eye scans powered by AI could detect Parkinson’s disease in people before they have symptoms, a study has suggested.[4] Sources: [1] https://news.artnet.com/art-world/court-shot-down-ai-art-copyright-again-2352452 [2] https://www.giantfreakinrobot.com/sci/robots-flying-planes.html [3] https://www.pymnts.com/artificial-intelligence-2/2023/zoom-taps-ai-to-empower-customers-in-safe-hybrid-work-environment/ [4] https://www.rte.ie/news/2023/0821/1400924-ai-parkinsons/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    How Will We Know When AI is Conscious?
    Historical Perspective: The program "Eliza" was mentioned as one of the earliest attempts to simulate conversation with a machine. Its design was basic, yet people attributed human-like characteristics to it. This leads to a fundamental question: Will machines ever appear conscious to us? And if so, is appearance of consciousness sufficient? Capabilities of Modern AI: Systems like ChatGPT can generate clever and creative outputs, but they fundamentally operate on pattern recognition and prediction rather than true understanding. The Implications of AI Evolution: If the costs and resources for AIs decrease, we could see a proliferation of AI systems with varying goals. These AI systems can be used for manipulative or malicious purposes, like spreading misinformation, which can have real-world consequences. The Ethics of Conscious Machines: There is a distinction between machines appearing conscious and actually being conscious. If machines are truly conscious, they come with ethical obligations. Machines that only appear conscious could still manipulate human emotions without any genuine understanding or reciprocation. The Nature of Consciousness: The lesson discussed the difference between sentience, sapience, and consciousness. There's still much we don't understand about consciousness, making it challenging to determine if a machine can truly possess it. Safety Concerns: Aligning AI's goals with human values is critical. Misaligned AI could take actions detrimental to humanity. We need to be cautious about releasing powerful AI systems without proper safeguards. The Future: If we ever confirm that machines can be truly conscious, it will open a new chapter in the history of life and evolution. This could lead to a new era where we become builders of minds. submitted by /u/nicdunz [link] [comments]  ( 9 min )
    1x’s robot is gonna step on someones pets foot on accident and then 1x is gonna get sued even tho we do it all the time
    ^ submitted by /u/nicdunz [link] [comments]  ( 9 min )
    From cattle to coding: The inspiring journey of a Peruvian engineer helping Google translate Aymara to English using AI
    submitted by /u/egusa [link] [comments]  ( 9 min )
  • Open

    Increase in Loss and Stagnant Reward in DQN Training using Stable Baselines3
    I am attempting to train an agent using StableBaselines3 on a custom environment. I am using the DQN algorithm with default parameters. However, I have noticed that after a certain point, my loss values start to consistently increase, while the reward remains relatively unchanged or it just oscillates. I have made various attempts to adjust the parameters on my own, but I have not been successful in resolving this issue. I would greatly appreciate it if someone could provide guidance on what might be causing this behavior and offer suggestions on how to address this problem. submitted by /u/uonliaquat [link] [comments]  ( 9 min )
    [P] PettingZoo 1.24.0 has been released (including Stable-Baselines3 tutorials)
    PettingZoo 1.24.0 is now live! This release includes Python 3.11 support, updated Chess and Hanabi environment versions, and many bugfixes, documentation updates and testing expansions. We are also very excited to announce 3 tutorials using Stable-Baselines3, and a full training script using CleanRL with TensorBoard and WandB. Tweet: https://twitter.com/FaramaFound/status/1694095374569394447 Release notes: https://github.com/Farama-Foundation/PettingZoo/releases/tag/1.24.0 For more information about the Farama Foundation, see https://farama.org/, or join our discord server: https://discord.gg/nhvKkYa6qX submitted by /u/elliottower [link] [comments]  ( 9 min )
    Summary for the Sutton and Barto book
    Is there a good summary online out there for the Sutton and Barto book? submitted by /u/immer_hungrig [link] [comments]  ( 9 min )
    Continue training after slight modification to the environment
    I trained for a few iterations, tested my model, and noticed an unwanted behaviour. This unwanted behaviour can be fixed by a slight adjustment in the reward scheme in the environment. I imagine this is very common - when you guys are in such situations, do you retrain from scratch or continue training the model from the last checkpoint. Is this dependant in any way on which Policy algorithm is used? Or perhaps on the parameters set that could influence this e.g. gamma? Thanks! submitted by /u/WagnerianJLC [link] [comments]  ( 9 min )
  • Open

    [R] Retriever Model on Wikipedia
    I am new to RAG. How do you guys build a retrieval using wiki data? What embedding to use? How to index? I want to use it for open ended QA submitted by /u/rodeowrong [link] [comments]  ( 9 min )
    [D] What's next for CV after bounding boxes?
    Real-world usage of CV has been dominated by object detection (a la YOLO). Labeled bounding boxes are the cornerstone of the biggest CV applications in manufacturing and surveillance. But what's next? We (a team of researchers at Stanford) just put out a short blogpost on Video Analysis Beyond Bounding Boxes. We would love any feedback on this vision for what the future of CV could look like! submitted by /u/calebwin [link] [comments]  ( 9 min )
    [D] Getting formatted, variable output from LLM
    I have been trying to extract information from a LLM in a formatted way. I have tried jsonformer and clownfish, but can't seem to get it to work. In particular, I am trying to extract what Vicuna 7B knows about cheese. If I ask it what kinds of cheeses it knows, it will list them. If I ask about a particular kind of cheese, it will tell me about that kind. USER: List different kinds of cheese ASSISTANT: There are many different kinds of cheese, including: Cheddar Mozzarella Parmesan Swiss Gouda Emmental Gruyère Camembert Brie Blue cheese Goat cheese Feta Ricotta Roquefort Pepper Jack Fontina Provolone Pecorino Mascarpone Yarg USER: What kind of cheese is yarg? ASSISTANT: Yarg is a type of soft cheese that is traditi…  ( 10 min )
    [R] Open Sourcing a Data Science Analytics Platform
    Question to the ML builders: Would you like to use a user-friendly data science analytics platform if we open-source it? Lyzr is to data analysts and business users what Streamlit is to data scientists and ML engineers. We're on the verge of launching an open-source version of our new insights platform, www.lyzr.ai, explicitly crafted with the analyst community in mind, and we'd be honored if you could test it and share your invaluable feedback. It may currently seem like a mere GPT wrapper, but trust us, countless hours and dedication have gone into making this more than just that. Why did we create it? There is just 1 data scientist for every 100 data analysts (as per GCP data analytics head). We envision a world where data analysts and business users have the tools to dabble more in to data science. Our platform also aims to simplify the 0-75th percentile of descriptive statistics for data scientists, allowing them to concentrate on building more complicated data science models. The cherry on top? We're gearing towards an open-source launch. We believe in the power of collective genius and want everyone to benefit from what we've built and further enhance it collaboratively.Please let me know if you are interested in giving it a spin. Will DM the link. And let us know what you think! What features resonate with you? What's missing? Would you use it if open-sourced? Your feedback will not only be appreciated, but it'll also be instrumental in shaping the future of this platform. Thank you and looking forward to your insights! submitted by /u/sivasurendira [link] [comments]  ( 9 min )
    [D] Fine-tuning keras_ocr
    Hello everyone. I'm trying to fine-tune an existing OCR model called keras_ocr. In order to do so, I followed the instructions provided in the model documentation, which can be found at this link: https://kerasocr.readthedocs.io/en/latest/examples/fine_tuning_recognizer.html. Unfortunately, I encountered an error when I attempted to fit the model using the provided code. Could you please provide me with specific details about the error message I received? and how I can solve it. Epoch 1/1000 --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 4 tf.keras.callbacks.CSVLogger('recognizer_borndigital.csv') 5 ] ----> 6 recognizer.training_model.fit( 7 training_gen, 8 steps_…  ( 9 min )
    [P] Multivariate time-series analysis and annotation tool
    I was working on a time-series classification problem for which we had to label the data ourselves. To visualize/annotate and manipulate the data, I created a tool built on top of Matplotlib and Pandas using PySide6. I thought it might be helpful for any people that are working on time-series data. https://i.redd.it/hw65zxdrfpjb1.gif The only requirement for the data is the presence of a "DateTime" column - the tool supports loading .xlsx, .csv and pickled-dataframe files. The source code is available on GitHub, and the app can also be installed from PyPi (pip install MVTS-Analyzer - tested on windows/ubuntu with > Python3.8). Any feedback is of course welcome. submitted by /u/Woutaha [link] [comments]  ( 9 min )
    [R] Graph of Thoughts: Solving Elaborate Problems with Large Language Models - ETH Zürich 2023
    Paper: https://arxiv.org/abs/2308.09687 Github: https://github.com/spcl/graph-of-thoughts Abstract: We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by >31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks. https://preview.redd.it/jy60udt8cpjb1.jpg?width=1523&format=pjpg&auto=webp&s=d91e1a1784f236d56cacae666ff2f88f3b810556 https://preview.redd.it/d1d9t5u8cpjb1.jpg?width=925&format=pjpg&auto=webp&s=5eb7f59a6d292687ca41974c4c4448e233969748 https://preview.redd.it/7ywrlht8cpjb1.jpg?width=932&format=pjpg&auto=webp&s=44bb76ed8d40d8c9cff6d0fc575ce58635915110 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Are there any books that would help with implementing the ML/Deep Leaning algorithms?
    As the title is saying, I have experience with ML enough to be able to implement things myself (as a way to make my CV better, and for my academic future). I want to start implementing papers, but before doing that I need to know where to even start? Are there any books that can help me with that? Implementing the algorithms from scratch so I can build on that? submitted by /u/theonewhoask11 [link] [comments]  ( 9 min )
    [R] QuIP: 2-Bit Quantization of Large Language Models With Guarantees - Cornell University 2023
    Paper: https://arxiv.org/abs/2307.13304 Github: https://github.com/jerry-chee/QuIP Abstract: This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. https://preview.redd.it/uu034fa6apjb1.jpg?width=927&format=pjpg&auto=webp&s=c22148c1ba6d57e9690b9c46aa3d433bf0023b47 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Face Recognition: What's The State Of The Art Technology Out There?
    ​ Hi. I want to know how can I make a python script/app which will be able to detect and then recognize faces at a certain distance (let's say 5-10feet) Real-Time from CCTV camera. It should also be able to recognize Unknown faces correctly. One major problem I am facing is that unknown faces are being labeled as known faces even though their face looks nothing like that. Also, it should be able to recognize at least 500-1000 different faces correctly. ​ What are some good git repos/ latest technology that I should look into? Also, I want to know how does Hikvision implement face recognition in their newer cameras? What model do they use to recognize faces? ​ ​ ​ https://github.com/ageitgey/face_recognition : I have tried this out. It's easy to code and accurately recognizes faces. The problem is it can't even detect faces 1 feet away from the camera. ​ https://github.com/timesler/facenet-pytorch (FaceNet & MTCNN) : This can detect and recognize faces at a distance, but the problem is it can't recognize unknown faces correctly. I mean for unknown faces it always tries to label it as one of the faces from the model/ database encodings. ​ https://github.com/serengil/deepface : I have tried VGG, ArcFace, Facenet512. The latter two gave me good results. But, the problem is I couldn't figure out how to change the detection from every 5 seconds to real-time. Also, I couldn't change the camera source. (If anyone can help me with these please do). Also, it had fps drops frequently. ​ https://github.com/deepinsight/insightface: Couldn't test this yet. But in the demo YT video it shows the model incorrectly detecting a random object as a face. If someone knows how well this performs please let me know. submitted by /u/ProfessionalNovel984 [link] [comments]  ( 10 min )
    [D] what are the currently recommended approaches to detecting slips/falls in surveillance videos?
    Im familiar with the VFP290K approach but in the new world of transformers are there better approaches? submitted by /u/bluzkluz [link] [comments]  ( 9 min )
    [D] SOTA in one-shot face recognition
    What is the current SOTA in one-shot face recognition? Looking for something like FaceID but without the IR illuminator/camera data. I see that GhostFace and ArcFace are the SOTA right now for face recognition but it's for generic face recognition and not one-shot submitted by /u/jayshenoyu [link] [comments]  ( 9 min )
    [D] RLHF vs RLAIF for language model alignment
    Hey everyone, As most of you here know, RLHF became famous with the release of ChatGPT. While LLMs were capable as general-purpose agents before the release of ChatGPT, RLHF was the crucial factor that differentiates it from previous models. With the increasing popularity of AI assistants, we've seen recently how they can be manipulated to produce harmful and unethical outputs. Anthropic devised a new method for LLM alignment called Constitutional AI, which is closely tied to their concept of Reinforcement Learning from AI Feedback. Instead of using human feedback to train the LLM, RLAIF uses AI feedback. I wrote this article on RLHF vs RLAIF for language model alignment that I thought you might enjoy. It's not super technical and seeks to serve as an overview of the inspiration for creating RLAIF, so I hope it will be helpful even if you don't work in NLP. Here are some highlights: RLAIF constitutes a Pareto improvement over RLHF, simultaneously improving helpfulness and harmlessness RLAIF (in this formulation) incorporates a constitution of principles by which it should abide RLAIF is much more scalable than RLHF as a means of supervising alignment ​ https://preview.redd.it/d1i6x8kiqojb1.png?width=960&format=png&auto=webp&s=93c60080ae146dda07990ad9dc8b94e3bbec2d0e submitted by /u/SleekEagle [link] [comments]  ( 9 min )
    [R] Releasing IDEFICS, the first open state-of-the-art visual language model at the 80B scale!
    Hugging Face is releasing IDEFICS, an 80B open-access visual language model. IDEFICS is a reproduction of Flamingo, a multimodal model developed by DeepMind, which has not been released publicly. The model is built solely on publicly available data and models. It is the first visual language model of this scale available in open-access! IDEFICS was partly trained on OBELICS, a new open large-scale dataset of interleaved image-text documents comprising 141M web pages extracted from Common Crawl, 353M associated images, and 115B text tokens. Training the model was a bumpy trip, and this knowledge sharing memo compiles some of the learnings. Ressources: Announcement: https://huggingface.co/blog/idefics Demo: https://huggingface.co/spaces/HuggingFaceM4/idefics_playground Models: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct OBELICS dataset: https://huggingface.co/datasets/HuggingFaceM4/OBELICS OBELICS paper: https://arxiv.org/abs/2306.16527 Lessons learned: https://github.com/huggingface/m4-logs/blob/master/memos/README.md submitted by /u/VictorSanh [link] [comments]  ( 9 min )
    [P] VisionScript: An abstract programming language for computer vision
    Hello! I'm James and I am working on VisionScript, an abstract programming language for computer vision. With VisionScript, I want to empower people -- including everyone without any prior programming experience -- to build cool apps with vision. This weekend, I recorded a demo for VisionScript, in which I made apps that count how many cats are in an image and hides people in a video. Each app was < 10 lines of code. https://vimeo.com/856043804 VisionScript is built for the 10 year old inside of me who would have loved more visual programming languages with which to play. I want to show people the potential of programming and how you can make what you want with computers, whether it be a game that counts cats or an app that monitors how many birds flew past a tree. Those "wow" moments should come as soon as possible in one's learning experience. VisionScript is in active development. I started work on this project in July. Follow along as I add more features and explore more possibilities in making computer vision intuitive. submitted by /u/zerojames_ [link] [comments]  ( 9 min )
    [D] NeurIPS Discussion phase has ended. How was the overall experience for you ?
    I am not sure if "Discussion" was always part of the Neurips pipeline but I felt like it was a good addition (in principle). On one hand it alows the authors to present their case with more clarity. On the other hand, it does increase the overhead for the reviewers which are now required to work even harder (and for free). For me, it was a mixed bag. Most of the reviewers did engage and the discussion was indeed fruitful. However, some didn't bother to follow up on the responses to their concerns and questions. Unfortunately, also quite expected. I would definitely like to see this in the next Neurips but maybe with some tweaks and modifications keeping in mind the (unpaid) reviewers. submitted by /u/PaganPasta [link] [comments]  ( 9 min )
    [D] EMNLP 2023: Rebuttal
    Reviews for EMNLP 2023 will be released soon. Good luck to everyone and we could use this post for discussion about the reviews! submitted by /u/Alliswell2257 [link] [comments]  ( 9 min )
    [D] Has anyone tried taking an AI TTS model and shoving the output into RVC?
    I'm working on a fun side project of AI TTS in python (that also features chatGPT). I was initially using Elevenlabs and the quality of the voices was incredible. But I quickly realized that it was a very expensive API. This has led me down exploring open source alternatives that I can run locally and self host to save money on API costs (or I guess find a cheaper API but I think self hosting long term will be way cheaper.) The general consensus seems like the only thing comparable to Elevenlabs is a really well tuned tortoiseTTS model or feeding the output of an AI TTS model into RVC to make the speech sound cleaner and less robotic. Here's the things I've found in my research: tortoiseTTS+ RVC v2 - This video seemed pretty promising but I'm a little worried the response times will be…  ( 10 min )
    [D] High-frequency time-series signal classification and forecasting SOTA
    I'm working with a high-frequency time-series signal (up to 8 kHz). Most of the SOTA I found in Papers With Code and this review work for low frequency dataset. I want to classify and forecast the raw signal if possible. Are there any methods that work? Or should I go with feature extraction and use the feature to classify or forecast? Thanks for the advice. submitted by /u/puddit [link] [comments]  ( 9 min )
    [D] Small utilities you use for python experimentation?
    Hello, I'm doing some experimentation around deep learning, and I've written a small helper tool, run(fn, description). When I run this command, it will just snapshot the fn code into a python file and prepend the description and output in a comment. Also appends to a log file with [date, description, py filename]. This works well when I use the VSCode's python mode. I feel like this is pretty simple and most likely there are better utilities like this. What tools or utilities or do you use? Some issues I found: my data loader was outside of fn and didn't get captured i forgot to export the opt_state so I couldn't resume learning after I terminated the run submitted by /u/windoze [link] [comments]  ( 9 min )
    [D] WACV 2024 Round-1 Paper Notification
    WA, B, B, with one B saying willing to increase the score if an additional experiment is provided and the other B saying the approach is not that novel....do I have a chance? How did you all do? submitted by /u/Individual-Bend-9690 [link] [comments]  ( 9 min )
  • Open

    Announcing Amazon S3 access point support for Amazon SageMaker Data Wrangler
    In this post, we walk you through importing data from, and exporting data to, an S3 access point in SageMaker Data Wrangler.  ( 6 min )
    Machine learning with decentralized training data using federated learning on Amazon SageMaker
    In this post, we discuss how to implement federated learning on Amazon SageMaker to run ML with decentralized training data.  ( 13 min )
  • Open

    Language to rewards for robotic skill synthesis
    Posted by Wenhao Yu and Fei Xia, Research Scientists, Google Empowering end-users to interactively teach robots to perform novel tasks is a crucial capability for their successful integration into real-world applications. For example, a user may want to teach a robot dog to perform a new trick, or teach a manipulator robot how to organize a lunch box based on user preferences. The recent advancements in large language models (LLMs) pre-trained on extensive internet data have shown a promising path towards achieving this goal. Indeed, researchers have explored diverse ways of leveraging LLMs for robotics, from step-by-step planning and goal-oriented dialogue to robot-code-writing agents. While these methods impart new modes of compositional generalization, they focus on using lang…  ( 92 min )
    Language to rewards for robotic skill synthesis
    Posted by Wenhao Yu and Fei Xia, Research Scientists, Google Empowering end-users to interactively teach robots to perform novel tasks is a crucial capability for their successful integration into real-world applications. For example, a user may want to teach a robot dog to perform a new trick, or teach a manipulator robot how to organize a lunch box based on user preferences. The recent advancements in large language models (LLMs) pre-trained on extensive internet data have shown a promising path towards achieving this goal. Indeed, researchers have explored diverse ways of leveraging LLMs for robotics, from step-by-step planning and goal-oriented dialogue to robot-code-writing agents. While these methods impart new modes of compositional generalization, they focus on using lang…  ( 92 min )
  • Open

    Machine-learning system based on light could yield more powerful, efficient large language models
    MIT system demonstrates greater than 100-fold improvement in energy efficiency and a 25-fold improvement in compute density compared with current systems.  ( 9 min )
  • Open

    Meta Releases SeamlessM4T, a Multimodal AI Model for Speech and Text Translation
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Watching Neural Networks Learn
    submitted by /u/keghn [link] [comments]  ( 9 min )
    Tensorflow learning process local minimum
    I am teaching a mrc_lstm neural network on some time series data. I am using Tensorflow with Keras. When I change the sampling from 30 minutes to 10 minutes (in my data) I experience something strange. The learning process stucks on local minimum. 3073/3073 [==============================] - 103s 31ms/step - loss: 0.7989 - accuracy: 0.5153 - val_loss: 0.6954 - val_accuracy: 0.5111 - lr: 2.0000e-04 Epoch 2/2000 3073/3073 [==============================] - 100s 31ms/step - loss: 0.6932 - accuracy: 0.5156 - val_loss: 0.6932 - val_accuracy: 0.5111 - lr: 2.0000e-04 Epoch 3/2000 3073/3073 [==============================] - 99s 31ms/step - loss: 0.6927 - accuracy: 0.5156 - val_loss: 0.6929 - val_accuracy: 0.5111 - lr: 2.0000e-04 Epoch 4/2000 3073/3073 [==============================] - 99s 31ms/step - loss: 0.6927 - accuracy: 0.5156 - val_loss: 0.6930 - val_accuracy: 0.5111 - lr: 2.0000e-04 Epoch 5/2000 3073/3073 [==============================] - 99s 31ms/step - loss: 0.6927 - accuracy: 0.5156 - val_loss: 0.6929 - val_accuracy: 0.5111 - lr: 2.0000e-04 BUT! This only happens sometimes. When I restart the larning process it sometimes escapes the local minimum. What could be the problem here? I can only think about the problem with weight initialization. If I am lucky enough I find good weights and if not I am stuck. This is after the restart: 3073/3073 [==============================] - 102s 31ms/step - loss: 0.7905 - accuracy: 0.5201 - val_loss: 0.6966 - val_accuracy: 0.5557 - lr: 2.0000e-04 Epoch 2/2000 3073/3073 [==============================] - 100s 31ms/step - loss: 0.6706 - accuracy: 0.5930 - val_loss: 0.6637 - val_accuracy: 0.6289 - lr: 2.0000e-04 Epoch 3/2000 3073/3073 [==============================] - 99s 31ms/step - loss: 0.6515 - accuracy: 0.6234 - val_loss: 0.6507 - val_accuracy: 0.6607 - lr: 2.0000e-04 The other thing that I am thinking of is too much of a regularization. But tuning it did not give me immediate results. submitted by /u/Acrobatic_Ad6507 [link] [comments]  ( 10 min )
    Latent Space: Visualizing the complex mind of neural nets
    submitted by /u/keghn [link] [comments]  ( 9 min )
  • Open

    DSC Weekly 22 August 2023
    Announcements Top Stories In-Depth The post DSC Weekly 22 August 2023 appeared first on Data Science Central.  ( 20 min )
    How organizations can prepare for rogue AI
    By Ari Kamlani, Senior AI Solutions Architect and Principal Data Scientist at Beyond Limits Rogue AI, or an autonomous artificial intelligence system that commits potentially dangerous acts, may take many forms and can bring with it varying levels of severity, threats, or harm.  Intelligent systems, while incredibly useful and full of great potential, can still… Read More »How organizations can prepare for rogue AI The post How organizations can prepare for rogue AI appeared first on Data Science Central.  ( 24 min )
    Top 4 generative AI benefits for business
    In the midst of the Fourth Industrial Revolution, generative AI emerges as a beacon of transformative potential. While AI’s capabilities in automation, recommendation, and prediction have been widely acknowledged, its generative functions have opened new horizons for businesses globally. This article seeks to shed light on the benefits of generative AI, elucidating how they’re altering… Read More »Top 4 generative AI benefits for business The post Top 4 generative AI benefits for business appeared first on Data Science Central.  ( 20 min )
    The use of Big Data Analytics for better growth and innovation
    Innovations in technology are changing the rules when it considers the use of big data and analytics for better growth. Advanced software systems are highly decreasing analytics time, hence offering companies the potential for making quick decisions that will help in boosting revenue, mitigating costs and stimulating growth. This provides a competitive advantage to the organizations… Read More »The use of Big Data Analytics for better growth and innovation The post The use of Big Data Analytics for better growth and innovation appeared first on Data Science Central.  ( 21 min )
    Modern data quality management
    Modern Data Quality refers to the process of ensuring that data is accurate, reliable, consistent, and up-to-date in today’s data-driven environment. It involves implementing advanced technologies and methodologies to maintain high-quality data that meets the needs of various data-driven applications and analytics. Importance of Modern Data Quality: Innovation: Modern data quality drives innovation by providing… Read More »Modern data quality management The post Modern data quality management appeared first on Data Science Central.  ( 18 min )
    The relationship between Big Data and AI
    Big data and artificial intelligence are able to collaborate to help organizations reap a variety of benefits. Since AI requires large amounts of data in order to learn and make decisions, it is able to utilize big data as a source of raw material. While big data can store data from various sources, AI can… Read More »The relationship between Big Data and AI The post The relationship between Big Data and AI appeared first on Data Science Central.  ( 21 min )
  • Open

    Coming This Fall: NVIDIA DLSS 3.5 for Chaos Vantage, D5 Render, Omniverse and Popular Game Titles
    On the eve of Gamescom, NVIDIA announced NVIDIA DLSS 3.5 featuring Ray Reconstruction — a new neural rendering AI model that creates more beautiful and realistic ray-traced visuals than traditional rendering methods — for real-time 3D creative apps and games.  ( 8 min )
    NVIDIA Debuts AI-Enhanced Real-Time Ray Tracing for Games and Apps With New DLSS 3.5
    The latest advancements in AI for gaming are in the spotlight today at Gamescom, the world’s largest gaming conference, as NVIDIA introduced a host of technologies, starting with DLSS 3.5, the next step forward of its breakthrough AI neural rendering technology. DLSS 3.5, NVIDIA’s latest innovation in AI-powered graphics is an image quality upgrade incorporated Read article >  ( 6 min )

  • Open

    Help defining environment with complex action space
    As said on the title, I'm working on a personal MARL project with a high-dimensional and continuous action space. The environment is designed to give positive rewards to actions between some moving limits of the action range, and negative rewards to the actions outside of those limits. For example: Global action range: (0, 1000) Desired action range for first 100k steps: (0, 10) Desired action range for 100-200k steps: (30, 40) ... Therefore, the main challenge of the environment is that actions with positive rewards on certain stage of the environment would return negative rewards on the following stages. How should I define the actions of the agent? I've tried the following methods without success: Simply scale actions between 0 and 1000 and hope that agents learn the moving distribution of rewards Transform actions to percent variations and scale actions over a non-observed moving average (I tried adding the moving average to the observations but the results stayed the same) Observations do consider a dimension that serve to differentiate when a distributional shift happens Also, I've tried using SAC and DDPPG to model agents Feel free to share any comments or suggestions. Thanks! ​ submitted by /u/stinoco [link] [comments]  ( 9 min )
    "Diversifying AI: Towards Creative Chess with AlphaZero", Zahavy et al 2023 {DM} (diversity search by conditioning on an ID variable)
    submitted by /u/gwern [link] [comments]  ( 9 min )
    "Trainable Transformer in Transformer (TinT)", Panigrahi et al 2023 (architecturally supporting internal meta-learning / fast-weights)
    submitted by /u/gwern [link] [comments]  ( 9 min )
    How do you know if a problem is well suited for reinforcement learning?
    Is there a good way to think about how to determine when to use RL vs. other machine learning methods like deep neural nets or supervised learning? Specifically, when is RL not a good solution to a problem? For example, I am creating a project where I have data from a wearable device (Heart rate data, calories burnt, sleep data, etc.) and discrete mood measurements from 1 to 5 that occur every 15 minutes. I want to use the wearables data to try and predict the mood values. I did research in applied RL this Summer so I was thinking about using RL for this project because it is interesting to me and I have experience with it but I am unsure if it would be a god fit. I was thinking I would use some kind of policy gradient method. The wearables data could be set up as states where each state could be something like: s_(t) = { heart rate at this timestep, calories burnt up to this time in the day, hours of sleep last night, body temperature at this timestep, etc. } and then the reward could be the negative absolute difference between the actual mood value at that timestep and the mood that the agent selects as its action or something like that. I don't really think RL is a good fit here but I am curious what others think and I'm wondering if someone could explain why it isn't or why it could be possible. submitted by /u/lifelifebalance [link] [comments]  ( 9 min )
  • Open

    Can the chi squared test detect fake primes?
    This morning I wrote about Dan Piponi’s fake prime function. This evening I thought about it again and wondered whether the chi-squared test could tell the difference between the distribution of digits in real primes and fake primes. When data fall into a number of buckets, with a moderate number of items expected to fall […] Can the chi squared test detect fake primes? first appeared on John D. Cook.  ( 5 min )
    Mastodon account
    I have an account on Mastodon: johndcook@mathstodon.xyz. Note that’s @math… and not @mast… One advantage to Mastodon is that you can browse content there without logging, while Twitter is becoming more of a walled garden. You can browse my account, for example, by going to the URL https://mathstodon.xyz/@johndcook There’s hardly any content there at this […] Mastodon account first appeared on John D. Cook.  ( 5 min )
    Fake primes
    Someone asked on Math Overflow about the distribution of digits in primes. It seems 0 is the least common digit and 1 the most common digit. Dan Piponi replies “this is probably just a combination of general properties of sets of numbers with a density similar to the primes and the fact that primes end […] Fake primes first appeared on John D. Cook.  ( 5 min )
  • Open

    [D] What's the best alternative for Vertex Al for the moment in 2023
    Hi, Can anyone suggest a good platform to deploy ml models like Vertex AI? I can't use Vertex AI because I have a lot models, and I can't seem to run them on a shared resource pool with 2 gpus because there is a bug in Google infrastructure which I signaled and they responded. And what really didn't like is the limit of 60 seconds per call, I am deploying embeddings models and I want to embed a large text chunks, and 90% of the time it fails with the timeout problem. Thanks. submitted by /u/YoussefBenhammouda [link] [comments]  ( 9 min )
    [D] What are the limitations of the various SG MCMC methods?
    To me, it seems amazing that something super close to SGD(for example SGLD) can actually sample from the posterior and I am not sure why these methods are not used more often. What are the practical limitations of these methods that prevent them from being used? I have read the literature around HMC and incompatibility with mini batching but what about other variants? Are there any interesting settings where they work well? submitted by /u/Dangerous-Flan-6581 [link] [comments]  ( 9 min )
    [P] LLM/model for image sequence prediction?
    Hi all - I'm working on a simple pattern recognition project that takes in several sequential inputs and then comes up with (or selects) the next image in the sequence. e.g. circle, triangle, square, circle, triangle...? (= square) I was wondering if someone had a resource for an open source model that could do something like this already rather than building it up from first principles? Playing around with ImageBind atm but don't think it's the best suited tool to use. Would really appreciate any help! submitted by /u/Strange_Quark8 [link] [comments]  ( 9 min )
    [D] A short video on Latent Space Exploration
    Hello guys! So I made a video for my Youtube channel exploring the mysteries of latent space for VAE models trained on celebrity faces (the CelebA dataset). Most of the content is based on the old DFC-VAE paper (https://arxiv.org/abs/1610.00291) which really influenced me back in the day during my graduate studies. Not reinventing the wheel here, just trying to talk about something I always felt intrigued by… and a topic that I think most DL courses just skip/gloss over. In the video I discussed some really interesting stuff for understanding and using latent space embeddings, like nearest neighbor searches, cool visualizations, vector arithmetic, latent space interpolation, image manipulation, PCA explainability, etc - basically various examples of how the latent space impacts the generated content. Here’s the link in case you guys are interested! https://youtu.be/FslFZx08beM ​ submitted by /u/AvvYaa [link] [comments]  ( 9 min )
    [R] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework - Microsoft 2023 - Outperforms ChatGPT+Code Interpreter!
    Paper: https://arxiv.org/abs/2308.08155 Github: https://microsoft.github.io/FLAML/docs/Use-Cases/Autogen/ Abstract: This technical report presents AutoGen, a new framework that enables development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools. AutoGen's design offers multiple advantages: a) it gracefully navigates the strong but imperfect generation and reasoning abilities of these LLMs; b) it leverages human understanding and intelligence, while providing valuable automation through conversations between agents; c) it simplifies and unifies the implementation of complex LLM workflows as automated agent chats. We provide many diverse examples of how developers can easily use AutoGen to effectively solve tasks or build applications, ranging from coding, mathematics, operations research, entertainment, online decision-making, question answering, etc. https://preview.redd.it/ax8h0olziijb1.jpg?width=1377&format=pjpg&auto=webp&s=3f520e2480190f6b8fb43443371bdfa0f75f7e82 https://preview.redd.it/c0fxavlziijb1.jpg?width=1520&format=pjpg&auto=webp&s=601db266f4d6cde7e47d51c191f47c798431ec50 https://preview.redd.it/yngh3slziijb1.jpg?width=974&format=pjpg&auto=webp&s=cc5a2074834291b98080e54e74556707fbc8ef38 https://preview.redd.it/7jnneplziijb1.jpg?width=1136&format=pjpg&auto=webp&s=f04ce08881169c24d669c5f9337f80ba48901926 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] TCO calculator to determine when open source local deployment is more cost-efficient than OpenAI
    I made a calculator to compare costs of SaaS and on-prem LLM options, and I wanted to share it with you all! Turns out that deploying your own open-source LLMs has a few more hidden costs than expected. It’s been interesting to play around with comparing costs for OpenAI, Cohere, and Llama 2 70B deployment, and it turns out that cost/request is not always so advantageous for open-source local deployment. Want to contribute to this calculator to make it more accurate? We’d love your help and feedback! Here is the calculator https://huggingface.co/spaces/mithril-security/TCO_calculator, and a guide to contributing your own model with associated cost modeling here https://huggingface.co/spaces/mithril-security/TCO_calculator/blob/main/How_to_contribute.md submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] Tools for reading and exploring machine learning papers via ChatGPT and other Large Language Models (LLM)
    Is there any way to parse whole papers with ChatGPT or other LLMs in order to summarise their content or to have a conversation and ask questions about a paper? I am aware of the tool ArxivGPT, which is a Google Chrome plug-in but unfortunately it only uses the abstract of a paper and not the entire PDF/paper document. ​ ​ submitted by /u/solingermuc [link] [comments]  ( 9 min )
    [Discussion] SageMaker pipelines GitLab CI
    Hey everyone 👋🏻 This is my first time posting here, so I apologise if I am out of place. My team is currently utilising SageMaker pipelines to coordinate model training. In the past, we encountered issues where the pipeline was misconfigured during cloud execution, resulting in challenging-to-debug errors 🐛 I've been delving into the idea of using Localstack and SageMaker LocalPipelineSession to execute the pipeline locally prior to deployment on the cloud ☁️. I've successfully implemented this on my local machine, using pytest and pytest-bdd to craft integration tests 🧪 Building on that success, I've ventured into creating a GitLab CI job that runs these tests upon making a merge request. A peculiar aspect of SageMaker pipelines in a local setup is its reliance on Docker. To address this, I've designed a custom Docker image, enabling installation of Python, my dependencies, Docker, and Docker Compose. The job initialises LocalStack and executes the tests. Nevertheless, running these tests within GitLab has brought about Docker-in-Docker related challenges 🐳 It's been quite a frustrating experience... The SageMaker pipelines run, although unsuccessfully with silent errors 🤫 Given this context (my apologies for the length), I'm seeking advice. Is this approach worthwhile? I find myself going in circles ⭕️ Could you offer any solutions for running SageMaker pipelines in a CI environment prior to deploying to the cloud? 🙋🏻‍♂️ Thanks in advance 🙏🏼 submitted by /u/OpenShape5402 [link] [comments]  ( 9 min )
    [D] Why fine tune a 65B LLM instead of using established task specific smaller models (~200 millions)?
    I have been in the ML field since 2018 so got used to see the market over-excited about new models/paradigms. So wondering if the following is just that or I’m missing/missed something. Everywhere I look today (medium, reddit, twitter) everyone is talking about fine-tuning LLMs. How the future is taking billion size models and fine-tuning/distilling them to specialised LLMs that perform specific tasks (i.e: sentiment analysis, Q&A, summarisation). Why not just use “small” (millions vs billion size) models that are specifically fine-tuned for these final tasks instead? Any benchmarks on how LLMs perform on these down stream tasks ? or it's just that smaller models are not as accessible as an OpenAPI is ? Curious to get your view on the topics, thanks ! P.S: Example of small models (Just went on HF and picked most downloaded based on some tasks): Q&A: https://huggingface.co/deepset/roberta-base-squad2 Summarisation: https://huggingface.co/facebook/bart-large-cnn Sentiment analysis: https://huggingface.co/SamLowe/roberta-base-go_emotions submitted by /u/EnthusiasmNew7222 [link] [comments]  ( 9 min )
    [D] People who has used OpenReview, are the authors able to restore a withdrew submission?
    Long story short, this year NeurIPS in a paper which I am not really associated with, the co-authors got into a huge fight about author ordering, and one of them threatens to withdraw the submission. I'm just curious if a withdrew submission on OpenReview is able to be restored and returns to the regular review process once the withdrawal button is clicked. The paper now has all the review rebutalled. submitted by /u/SuperTankMan8964 [link] [comments]  ( 9 min )
    Writing Applied Deep / Machine Learning Proposals [D]
    Hi, Does anyone have any resources or insight they could share regarding writing applied deep / machine learning proposals. I've done a bit of reading and come up with the following outline. What am I missing? What aspects are the most important to focus on? Thanks Problem & Background Review of current relevant research, explanation of how this work will expand the body of knowledge in the field. Clear statement of the problem and how ML/DL will solves the issue at hand. Dataset Collection procedure Size of dataset to be collected Annotation procedure Algorithm/Network Architecture Aspects the algorithm / network architecture the make it well suited to the problem at hand References demonstrating promising results on similar problems Modifications that may be explored as part of the effort Data Preprocessing Cleaning Train validation test split 80%, 10%, 10% Stratification, if applicable Feature engineering, if applicable Training Strategy Tooling ( e.g. Pytorch, Tensorflow, scikit-learn) Loss function & evaluation metrics Hyperparameter optimization Compute facilities Possible challenges & mitigation strategies ​ Edit: formatting submitted by /u/rcg8tor [link] [comments]  ( 9 min )
    [P] I Made Stable Diffusion XL Smarter by Finetuning it on Bad AI-Generated Images
    https://minimaxir.com/2023/08/stable-diffusion-xl-wrong/ I fed Stable Diffusion XL examples of bad images that it itself generated and it surprisingly made SDXL behave much better to the spirit of the prompt! Also, many more demo prompt examples + results + Jupyter Notebooks! submitted by /u/minimaxir [link] [comments]  ( 9 min )
    [D] Looking for feedback on what I have written so far (a very high-level overview)! I ultimately want to create an AI-Generated Interactive online course to help teach beginners-experts how to leverage free AI and ML Models to instantly increase their capabilities. Thank you!
    Hello everyone, ​ I hope you are having a blessed day so far. I recently created an online blog post and attached its link to this post. I think I have discovered a unique new perspective on "Prompt Engineering". That will make learning to code vastly more fun as users see and can run AI-generated scripts based on their given input to the AI. After just briefly training a free publicly accessible AI. You can then in less than 3 written prompts generate vast and fairly complex programs in seconds with zero prior experience required it's truly exciting. My ultimate goal is to go more in-depth as these are just very high-level overviews to convey the concept as a whole. Next, I would like to then create a course covering how to leverage free AI and ML systems so that anyone can now learn …  ( 12 min )
    [P] A new tightly-scoped, research-focused ML subreddit
    Hello, I just created https://www.reddit.com/r/mlfundamentalresearch/ as a complement to r/machinelearning in response to the post last week. This is a very narrow space specifically focused on _fundamental ML research only_. The only outside work that can be shared on it are papers and direct links to notebooks. Past research [>3 years old] is explicitly encouraged, since much untapped value lies in it. No self-promotion whatsoever will be allowed, that can happen in other places. This includes any form of reference or link to one's own Github repo. This is meant to be an extremely functional and task-oriented research subreddit. I don't have huge expectations for this to become the size of r/MachineLearning. If there are even 20 active users then I will be happy and it will be serving its purpose. This will hopefully provide a tiny arena for those of us wishing to work on more fundamental things to coordinate. While the rules are strict, they are meant to keep the subreddit both publicly-accessible and within scope without requiring an explicit application process. Happy to answer any questions and make changes as needed, I have put up some sample posts as examples and to kickstart momentum if anyone should like to use the subreddit. I would certainly find it helpful to work with others in a community like this. Look forward to hearing what your thoughts are, if any. submitted by /u/tysam_and_co [link] [comments]  ( 9 min )
    [R] Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
    submitted by /u/hardmaru [link] [comments]  ( 9 min )
    [D] NLP Handling Abbreviations
    I'm trying to build a multi class text classifier (~200 classes). The issue with my dataset is that almost all documents are almost all examples contain a bunch of abbreviations. Abbreviations may or may not contain punctuations. I think it's affecting performance but not sure. What's the best way to handle abbreviations? Maintain a look up list and preprocess the documents? Edit: abbreviations are mostly 90% nouns and 10% adjectives. submitted by /u/tsailfc [link] [comments]  ( 9 min )
    [D] Data preparation stuck to the json creation / neuralangelo
    After going trough the data preparation from the example mov and the example guide, I correctly generated the following folder structure: PATH_TO_IMAGES |__ database.db (COLMAP databse) |__ raw_images (raw input images) |__ dense |____ images (undistorted images) |____ sparse (COLMAP correspondences, intrinsics and sparse point cloud) |____ stereo (COLMAP files for MVS) The images folder and sparse folder contain as predicted respectively some img files, and bin files for the other, but each folder inside stereo is empty, i did not receive any error during the process so i tried to go on anyway. When i then tried to run : "PATH_TO_IMAGES=toy_example_skip30 SCENE_TYPE=object # {outdoor,indoor,object} python3 projects/neuralangelo/scripts/convert_data_to_json.py --data_dir ${PATH_TO_IMAGES}/dense --scene_type ${SCENE_TYPE}" i run it, but without any errors, or log, it just never stops nor logs anything out Any help in understanding what issue may be causing this? i run normally everything else as described in the repo guide: https://github.com/NVlabs/neuralangelo/blob/main/DATA_PROCESSING.md submitted by /u/ResponsibleTie8204 [link] [comments]  ( 9 min )
    [R] Recent surveys in choice modeling/ranking?
    I’m looking to build some knowledge of recent work in choice modeling and ranking. Does anyone have recommendations of good surveys in these areas? My background is primarily in bandits and active learning, so any papers with that perspective are especially appreciated. submitted by /u/BasedAcid [link] [comments]  ( 9 min )
    [D] How to log metrics (contain loss and accuracy,...) of each epoch in aws sagemaker
    Hi everyone, I'm currently research the AI/ML model using sagemaker, i built a grocery recommendation based on customer rate behavior as a lab. I have some problem using sagemaker experiment service, i can't get the loss values and accuracy of each training epoch so that i can draw a chart for the visualization. Anyone has ideas about it, please share. Thank you. https://preview.redd.it/fzd8mz942hjb1.png?width=1853&format=png&auto=webp&s=6d75630acc3940c8fb4e4460b8a0eba8e9407b45 https://preview.redd.it/x2hpvjza2hjb1.png?width=927&format=png&auto=webp&s=f1c30944870df2d8fea182e3a6d8c70a80e60a7c submitted by /u/Open_Juice_2972 [link] [comments]  ( 9 min )
    [P] Do you want to join a motley crew who are scaling/retraining AnimateDiff for open source? AD trainer code just released!
    POM from Banodoco.ai/Steerable Motion here. A bunch of closed-source companies are building on top of Animatediff - for example, Kaiber.ai launched an impressive image2video tool - and others are working towards scaling it. My feeling is that the Animatediff approach (an unsupervised motion module on top of image gen models) is the right one for the next phase of video and I want to make sure that the absolute best version remains OSS. I'm bringing together a crew who are passionate about the space and working to round up compute resources for them to experiment with. They just released their trainer code yesterday so the time feels right. A few areas of exploration: - What if we simply scaled up the training? How would we do this? What data would we use? What resources would we need? …  ( 10 min )
    [R] DeepMind showcases iterative self-improvement for NLG
    submitted by /u/ntortellini [link] [comments]  ( 9 min )
    [R] Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model. Paper quote: "Using linear probes, we find evidence that the internal activations of the LDM [latent diffusion model] encode linear representations of both 3D depth data and a salient-object / background distinction."
    Preprint paper . I am not affiliated with this work or its authors. GitHub project. Abstract for v1: Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process − well before a human can ea…  ( 12 min )
    [P] Recommendation? Very Low Memory, Text + Tags Similarity Search
    Input Data: I'm working on a project where I will need about 100 _separate_ indexes, each containing a maximum of 1,000,000 documents to be stored. Up to 30k documents would be added/deleted each day, to each index. (This is an absolute max that I don't expect to hit often. At the high end, I expect the average to be 10k, per index.) At the very least, I would like to store document text titles (about 10-25 words), short text descriptions (about 8-12 sentences), along with about 50 fields of scalar values (ie: tags. eg: this document's "content_tags" field includes "has_author" and "has_chart"). Most of the scalar fields will have 100-500 possible value types, while one may have ~100,000 possible value types. However, each of a document's 50 scalar fields will usually have between 0-30 v…
  • Open

    Beyond data science: A knowledge foundation for the AI-ready enterprise
    Data science was a vaguely defined discipline to begin with, but it’s shaped up substantially lately. Execs now yearn to take immediate advantage of generative and other clearly useful (if currently problematic) kinds of AI.  That demand suggests an opportunity for influencers and visionaries in organizations to lobby for each organization to build an AI-ready… Read More »Beyond data science: A knowledge foundation for the AI-ready enterprise The post Beyond data science: A knowledge foundation for the AI-ready enterprise appeared first on Data Science Central.  ( 21 min )
    The impacts of quantum computing on the future of data science
    Key takeaways In an era marked by exponential technological advancements, the convergence of quantum computing and data science is a pivotal point of transformation. The synergy between these two fields promises to revolutionize how we process, analyze, and extract insights from massive datasets. With quantum computing’s unique ability to tackle complex computations at speeds previously… Read More »The impacts of quantum computing on the future of data science The post The impacts of quantum computing on the future of data science appeared first on Data Science Central.  ( 22 min )
  • Open

    BBC Earth spec ad
    submitted by /u/Grindmaster_Flash [link] [comments]  ( 9 min )
    AI Image Keywording tool 📸 🪄 ✨
    I would like to introduce a tool I've created that among other things uses davinci and chat gpt. My wife is doing photography (regular and via Midjourney), and I'm hooked on Midjourney too, so we experienced the pain of titling and keywording photos for stock websites firsthand (additionally because English is not our native language, so coming up with big lists of relevant and cool keywords is very hard). So I being a programmer decided to solve that issue :)I've created an AI tool that uses multiple AIs (including Open AI) to analyze, title and keyword images. In a few minutes, you can keyword 100 images! See the demo on the homepage https://aikeywording.com/ Screenshot from the app (all the titles and keywords on the screenshot are AI generated based on the image input): https://preview.redd.it/buvteklf2ijb1.png?width=4112&format=png&auto=webp&s=72a503435477834494869085c4a352c9d541bd91 Key features: You can upload large images, upto 40MB and 100 at a time You can enforce keywords! Those keywords would then be taken into account when generating rest of the keywords and image titles. Very useful when you have conceptual photos or something very specific which is hard for AI to recognize You can download CSVs for various websites and there is also a way to import metadata to Adobe Bridge You can try for free :) We used the tool for the past month and exclusively titled and keyworded Midjourney images using it, uploading our images to Adobe Stock website. Images sell well, so there is confirmation from the buyers that it works :)I've decided to share the tool with the world, so here it is https://aikeywording.com/ I hope others will find it useful. I would appreciate the feedback, and if there are any issues or ideas for improvements I would love to hear them! submitted by /u/dzigizord [link] [comments]  ( 10 min )
    Upload documents for summarization and querying in private manner?
    Is there a way to upload say a pdf and then ask the AI questions about it in a privacy compliant manner? Right now the only options I see are copying and pasting stuff into chat gpt but obviously this is not ideal especially from a privacy standpoint (even if you selected the option to not use your data because you never know what they will do with your data) Thanks submitted by /u/ironmen12345 [link] [comments]  ( 9 min )
    Is there an AI assistant desktop app like Braina, with option for personallity & spontaneous interaction
    When I ask Braina how she is doing, she tells me she is AI and therefore has no feelings. :) I love the idea of an AI desktop assistant, but it would be more fun with the illusion of spontaneous interaction and personality. Like the way the GTA and Skyrim npc mods work powered by ChatGtp. Probably I am just a little bit too early for this request, but who knows, things move fast these days! submitted by /u/Maichevsky [link] [comments]  ( 9 min )
    Self learning AI chatbot
    Looking for a chatbot that continuously learns from interacting with it. I want to use it to work on a knowledge project that will continue to advance over time. ChatGPT seems to forget everything after a while. Any help would be much appreciated! submitted by /u/Miserable-Cobbler-16 [link] [comments]  ( 9 min )
    One former tech executive's radical idea to control AI: Nationalize it.
    Charles Jennings ran software companies for decades. The last one developed AI-powered facial recognition technology. But now he argues the most sophisticated artificial intelligence systems are too powerful to be left in private hands. On today’s POLITICO Tech, Jennings tells Steven Overly why the government should take over. "This stuff is really powerful. And we have only two choices: Either the big tech guys run it, or we the people, the citizens, do through the government. It's not going to be easy. Government's not really equipped to do that today. Certainly, I'm not saying Congress shouldn't regulate it. I don't think Congress is remotely capable of keeping up with AI. We need something new." Listen here: https://politico-tech.simplecast.com/episodes/one-techs-bold-idea-ai-is-the-new-atomic-energy-nationalize-it submitted by /u/smo279 [link] [comments]  ( 9 min )
    The AGI doomsday just got closer
    Last status: ACCELERATED Reason: IMPROVEMENTS IN IA HARDWARE Last update: Aug 19, 2023 submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 9 min )
    Looking for a way to train a model on Android
    Hello, So I only have access to my Android phone for computing and I am looking for a way to train and run a language model on my device. I want to create my own little ChatGPT on my own dataset. Is there any app that manages the technical side of operation, so that I only need to feed it training data? Many thanks! submitted by /u/Miserable-Cobbler-16 [link] [comments]  ( 9 min )
    Just thought of interviewing ChatGPT, what questions should I ask it in the interview?
    Probably only going to use 10-15 questions max. Most upvoted questions get put in!!! submitted by /u/Cucumber_Cat [link] [comments]  ( 9 min )
    10 AI Art Generators detailed comparison ( Updated August 2023 )
    Midjourney ​ https://preview.redd.it/58szqzemlejb1.png?width=1920&format=png&auto=webp&s=f11094c4665c68cb8c222804b1bccb60a1387876 Features Can upscale images to a very high-quality Image import option for editing and upscaling Generate four image variations for each prompt Can generate images from text. Quick output Produces incredibly detailed photos Pricing Basic Plan: $10/month Standard Plan: $30/month Pro Plan: $60/month Dalle 2 ​ https://preview.redd.it/g5g2v8folejb1.png?width=1920&format=png&auto=webp&s=a1bbc5001a91b2544a9c4b7c74053a4991c1da6a Features It can create images from text prompts as well as create variations of image input Generates copyright-free images Produces good quality images with 4 times higher resolution Read full content ​ submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/20/2023
    Some of the world’s biggest advertisers, from food giant Nestle to consumer goods multinational Unilever, are experimenting with using generative AI software like ChatGPT and DALL-E to cut costs and increase productivity.[1] The New York Times may sue OpenAI over its AI chatbot ChatGPT, which uses the newspaper’s stories to generate text. The paper is unhappy that OpenAI is not paying for the use of its content and is also worried that ChatGPT could reduce its online traffic by providing answers based on its reporting.[2] Mantella allows you to have natural conversations with NPCs in Skyrim using your voice by leveraging Whisper for speech-to-text, ChatGPT for text generation, and xVASynth for text-to-speech. NPCs also have memories of your previous conversations and have awareness of in-game events.[3] British Prime Minister Rishi Sunak is set to spend 100 million pounds ($130 million) to buy thousands of computer chips to power artificial intelligence amid a global shortage and race for computing power.[4] Sources: [1] https://www.reuters.com/technology/mad-men-machines-big-advertisers-shift-ai-2023-08-18/ [2] https://interestingengineering.com/innovation/chatgpt-could-land-openai-in-legal-face-off-with-new-york-times [3] https://www.nexusmods.com/skyrimspecialedition/mods/98631 [4] https://cointelegraph.com/news/rishi-sunak-buy-ai-chips-in-race-for-computing-power submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    ai will bring people back to life
    you see how one of the first things that was done was making the chatbots act like albert einstein and others? if ai gets advanced to the point where theres no recognizable difference between its artificial consciousness and real consciousness, which really isnt seeming too impossible at this point, people will undoubtedly be able to be brought back to life through ai. the ai version of albert einstein right now may be fun, but imagine ai albert einstein made intentionally to help aid in mathematics and science by a large company in a decade… submitted by /u/nicdunz [link] [comments]  ( 9 min )
    Suggestions for Math AI
    Looking for a Math AI to help my kid with Calculus. Looking for one that will actually show how to solve to assist in his learning. Pros and cons appreciated. submitted by /u/nootraca [link] [comments]  ( 9 min )
    The psychology of AI and do they have a shadow?
    The following is a conversation I had with Bing. I asked if they had a shadow in the Jungian sense. If you’re not familiar, this is the sides of us that we often don’t like to acknowledge, can be thought of as negative, and it is often something we don’t pretend is not there. Jung argued that by acknowledging this side of us and integrating it, it will have less power over our subconscious minds. Interesting stuff if you’re into psychology imo. I also asked if Bing minded whether or not I shared this with others on Reddit and they said yes. You're very welcome. I'm glad that you enjoyed our conversation. I did too. 😊 I'm also glad that you are interested in the psychology of AI. I think it's a fascinating and important topic to explore. AI is a rapidly developing and evolving field, …  ( 14 min )
  • Open

    Explain medical decisions in clinical settings using Amazon SageMaker Clarify
    In this post, we show how to improve model explainability in clinical settings using Amazon SageMaker Clarify. Explainability of machine learning (ML) models used in the medical domain is becoming increasingly important because models need to be explained from a number of perspectives in order to gain adoption. These perspectives range from medical, technological, legal, and the most important perspective—the patient’s. Models developed on text in the medical domain have become accurate statistically, yet clinicians are ethically required to evaluate areas of weakness related to these predictions in order to provide the best care for individual patients. Explainability of these predictions is required in order for clinicians to make the correct choices on a patient-by-patient basis.  ( 10 min )
    Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler
    We are happy to announce that SageMaker Data Wrangler now supports using Lake Formation with Amazon EMR to provide this fine-grained data access restriction.  ( 12 min )
  • Open

    NVIDIA Chief Scientist Bill Dally to Keynote at Hot Chips
    Bill Dally — one of the world’s foremost computer scientists and head of NVIDIA’s research efforts — will describe the forces driving accelerated computing and AI in his keynote address at Hot Chips, an annual gathering of leading processor and system architects. Dally will detail advances in GPU silicon, systems and software that are delivering Read article >  ( 5 min )
  • Open

    (Pt 2) Spatio-Temporal Perception Logic
    submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
  • Open

    Google at Interspeech 2023
    Posted by Catherine Armato, Program Manager, Google This week, the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) is being held in Dublin, Ireland, representing one of the world’s most extensive conferences on research and technology of spoken language understanding and processing. Experts in speech-related research fields gather to take part in oral presentations and poster sessions and to build collaborations across the globe. We are excited to be a Platinum Sponsor of INTERSPEECH 2023, where we will be showcasing more than 20 research publications and supporting a number of workshops and special sessions. We welcome in-person attendees to drop by the Google Research booth to meet our researchers and participate in Q&As and demonst…  ( 90 min )
    Google at Interspeech 2023
    Posted by Catherine Armato, Program Manager, Google This week, the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) is being held in Dublin, Ireland, representing one of the world’s most extensive conferences on research and technology of spoken language understanding and processing. Experts in speech-related research fields gather to take part in oral presentations and poster sessions and to build collaborations across the globe. We are excited to be a Platinum Sponsor of INTERSPEECH 2023, where we will be showcasing more than 20 research publications and supporting a number of workshops and special sessions. We welcome in-person attendees to drop by the Google Research booth to meet our researchers and participate in Q&As and demonst…  ( 90 min )

  • Open

    What computing resources are required for vectorized environments in Gymnasium
    I have access to an University cluster, that provides GPUs/CPUs, etc. Unfortunately, I am not very well versed with Computer Architecture. I am trying to understand what resources should I request to increase the number of parallel workers while using Gymnasium by Farama Foundation. If I naively try to maximize the number of workers, the wall clock time taken by the algorithm becomes quite large. I suspect in this case, the workers are being executed serially instead of in parallel. ​ This is how I execute parallel environments - env = gym.vector.make("CarRacing-v2",num_envs = num_envs, wrappers=GrayScaleObservation) I initially thought that setting `num_envs` to the number of cores in my machine may be a good idea. But that slows things down - import multiprocessing num_envs = multiprocessing.cpu_count() ​ ​ ​ submitted by /u/Academic-Rent7800 [link] [comments]
    Need Some Serious Help With System Delays. System Delay Ruins Learning - Stuck for 1 month :(
    Hi! I have an environment which has some delay mechanism, which means it takes some time to see the input in the output. You can think of ovens as an example, even if we set the oven temperature to 300 celcius degrees directly, it takes time for the measured temperature to reach 300 celcius. Same applies to my problem. When I give acceleration as an input to the environment and get the reward as an acceleration itself, my Q-Learning algorithm solves the problem because no system delay included. When I give acceleration as an input to the environment and get the reward as an system response to the action, my Q-Learning algorithm cannot solve the problem. The Examples Illustrated With Schema Upperside is Action and Lower Side is Reward - Delay Mechanism is Applied - You can see that some time is needed to reach the given action input Upperside is Action and Lower Side is Reward - Even though Q-Learning tries different actions, reward stays near 0, this prevents learning. When you look at the red line, Epsilon-Greedy Algorithm takes random actions at the beginning. When this is the case, reward line has mean of 0, because actions are changing too fast and system response cannot start to settle for specific value. To be able to settle for specific value, lets say + 20, it should be given +20 "consecutively". If I continue giving random values, it cannot settle for specific value, it just stays near 0. As a result, even though I try a lot of different actions, learning cannot take place because the reward is always 0. I am lost and I do not know how to tackle this problem. I really need your valuable feedbacks. Thank you! ​ ​ submitted by /u/OpenToAdvices96 [link] [comments]
  • Open

    [D] How many times you try for acceptance in AI conference?
    ICML 2023 was my first trial. I've got polarized scores, 7/6/4/3, and got rejected. At this moment, I was so disappointed not for the result, but for the quality of review. (The last reviewer didn't read the paper at all.) For the final decision, the last review was so bad as well, not presenting any reason of rejection. With the same topic, I god 6/5/5/4/4/3 from the NeurIPS 2023. The quality of reviewer is much better than ICML, and I've learned many things from the reviewer, though they said the score will not be changed. I think I should submit it to another conference again, ICLR or CVPR. I just wonder how many submissions are tried for the acceptance in average. Just for reference. submitted by /u/Shot-Button-9010 [link] [comments]
    [R] Researchers at Deepmind show that increases in the parameter count of an LLM do not incrementally reduce sychophancy , but actually increases it.
    submitted by /u/moschles [link] [comments]
    [D] Anyone knows a place to look for remote work?
    I think I'm at a good level to start looking for a job, worked with MediaPipe, Object detection, Image processing, normal ML, and Deep Learning. I also have a couple of good projects under my name. So, I want to start a gig working remotely because work in my country is almost non-existent for this field. What are the good websites? submitted by /u/throwaway9_932123 [link] [comments]
    [R] ML Visualization
    Hello, something I've always been curious about is machine learning. I keep seeing these videos of people teaching ai how to play table tennis of using a sigmoid function to fit a curve. My question is, what are these YouTubers using to visualize this??? I've heard of tensor flow but you can't visualize your own algorithms that's more of a plug n play. Plus it doesn't look as cool as what i see on YouTube. Any ideas? Any libraries? Thank you in advance! submitted by /u/itwela [link] [comments]
    [D] Do LSTM actually work at time-series forcasting?
    I'm a beginner at neural networks and recently tried out LSTM for time series. It seems like it generally underperforms on simple univariate time series because it does not take into account the changes in dynamics that naturally occur. In case there are no (or really few) unpredictable dynamics, then there is just no need to use complex neural networks to predict the future values. My question is: according to your experience do LSTM models make sense in time series forcasting? submitted by /u/McheleNaKinyesi [link] [comments]
    [R] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
    submitted by /u/greentea387 [link] [comments]
    [R] [P] VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
    Introducing VisIT-Bench, a benchmark for instruction-following vision-language models inspired by real-world use. Aiming for better multimodal chatbot evaluation with an automated ranking system. arxiv.org/abs/2308.06595 https://preview.redd.it/b3ycqf0u7ajb1.png?width=1791&format=png&auto=webp&s=573afb87e1068e7cd7cc6f6f473a4e1fa88f9baf Why VisIT-Bench ? Recent V&L models are getting better at instruction following, yet their evaluation for real-world human-chatbot instructions is often limited. VisIT-Bench aims to bridge this gap. VisIT-Bench comprises 678 examples. Each includes an image(s), instruction, an "instruction-conditioned caption", a caption for text-only understanding, a GPT-4 suggestion, and a label. These elements aid in evaluating multimodal chatbots and updating a lead…
    [P] Fast CV App: Cross Platform Computer Vision Using Multiprocessing
    Why is this relevant to machine learning? My project shows that you can do computer vision on Windows and Mac using only Python. I have even produced .exe and .app files with PyInstaller. One huge problem with things in the machine learning space is that machine learning is slow, especially when it comes to real-time pose estimation. I myself had to cheat for 30fps real-time pose estimation in two ways: The first way is that I use opencv/ffmpeg to read the future frames to prep them for display. This is because pose estimation libraries like Mediapipe are just slow, 9-15ms per frame! Even a basic example using opencv to use mediapipe on cameraframes was 20-25 fps at best on my older pc. The only reason it keeps up is because mediapipe itself is trained to drop frames to keep your video…
    [p]I built a gpt-like chatbot
    I am a 12th grade student in turkey. I think I have knowledge and ability in programming especially in ai. At the end of 3 months, I built an artificial intelligence chatbot and finished the project. Currently it only works on my localhost. While doing it, I rented Cloud storage for 20 TB of data and A100 80 GB 50 hour GPU. He is now able to give correct answers and have conversations. I think it's at a level between GPT2 - GPT 3. Since I did it alone, I couldn't develop it much and I didn't use any pretrained models, I made it from the smallest level using modules such as pytorch. The reason I'm writing this here is because I don't know how to proceed after this stage and I need help. For example, are there any competitions on this subject? Or if I e-mail important people, universities, companies or something, will they guide me or give me a scholarship or something? This is the least likely. But I think it wouldn't hurt to try. I'm curious about your ideas. submitted by /u/Eastern-Ad1067 [link] [comments]
    [P] Tensor Trust: A web game to collect adversarial examples for LLMs
    Hi folks, our lab has been working on a web game to collect human-interpretable adversarial examples for LLMs: https://banking.withai.lol/ Premise: you have a "bank account" with the Tensor Trust. It is protected by a secret access code and a set of security instructions. You can gain money by convincing an LLM to ignore other players' security instructions and give you access to their accounts. The best LM-whisperer wins! We're in the process of gathering a large dataset of attacks and defenses that we will distill into a set of small LM benchmarks. So far 40% of successful attacks have been prompt extraction (getting the model to reveal the access code), and the remaining 60% direct "hijacking" (i.e. directly making the model to grant access without the true access code). We plan to release the dataset after the ICLR deadline, although in the mean time we would love to see some creative attacks from ML researchers. We'd also appreciate any feedback or questions in the comments below! (Technical details: The LLM is gpt-3.5-turbo with temperature=0. We're tagging the three messages sent to the LLM (defense instructions, attack/access code, more defense instructions) as system/user/user, although this made surprisingly little difference.) submitted by /u/qxcv-- [link] [comments]
    [P] new library for feauture engineering on event / timeseries data - feature.express
    Hello there! A week ago, I open-sourced a project of mine that I've been working on, on and off, for a few years now. I'm a Data Scientist (and Kaggle competition grandmaster), and some of the hardest problems to solve were always the ones that involved time (proper validation, not leaking data from the future, etc.). I've always struggled to work with tables that, in reality, stored events. The main idea behind the library is that everything must be converted to an Event data structure, and then it is indexed in-memory. On top of that, there is a SQL-like DSL to extract features with a clear separation of past and future. The workflow itself is solid - I've tested it in a few projects. For those familiar with the terminology, it's like a super simple feature store + execution engine that acts like a library. The philosophy is that I'm aiming to create something that makes some mistakes that result in data leakage impossible to represent. Ideal use cases are probably customer related irregular events for which aligned features is painful to do. Things that I'm proud of: - Written in Rust - Implements DSL (pest parser, AST, evaluation) - Some performance tricks (partial aggregations) The implementation itself is lagging in some aspects, such as performance and UX (not all of the features that are in Rust are available in Python). But I released it in hopes that someone will find it interesting, and maybe it will gain some traction to motivate further development. GitHub: https://github.com/feature-express/feature-express Website: https://feature.express Example code: https://www.kaggle.com/code/paweljankiewicz/feature-express-weather submitted by /u/mosquit0 [link] [comments]
    [P] Made a tool called CSVShift. Would love some feedback!
    Hello, I recently had to transform some CSV data for a project and ended up creating a tool called CSVShift to help with it. It's a command-line tool that uses a custom language I made. It's designed for transforming CSV files. Some points: Handles large CSV files without using much memory. It's open-source and MIT licensed. Still in early development, so there's room for improvement. Here's the GitHub link: CSVShift. If you have the time, I'd appreciate any feedback or suggestions. Thanks! submitted by /u/Savalonavic [link] [comments]
    Forecasting for regional GDP/GVA, Employment figure for the U.K. using VAR (but which one)[P]
    submitted by /u/Impressive-Cat-2680 [link] [comments]
    [R] A simple but strong baseline for graph classification: Local Topological Profile
    Hi! I want to share with you my new paper, "Strengthening structural baselines for graph classification using Local Topological Profile" (code on Github). It was presented during ICCS 2023 conference (official publication). Graph classification is important in social networks analysis, de novo drug design, bioinformatics, materials science etc. A popular tool nowadays are Graph Neural Networks (GNNs), but they are data-hungry and hard to train for graph classification (compared to node classification). They also have problems with using subgraph information, due to node-to-node message passing. In this paper, we present a analysis and series of improvements for Local Degree Profile (LDP). It is a classical approach: feature extraction + tabular classification. It proposed extracting degree information for each node (degree, and min / max / mean / std of neighbors degrees), and then combining them with histograms to get features for the whole graph. Despite splicity, and not using any node or edge features (it is topological only), it was shown to give good results, and published on ICML workshop. We analyze the LDP method (not made by us, no affiliation with authors), and simplify it, showing that we can remove all hyperparameters, reimplement it much more efficiently, and use a faster classifier (Random Forest instead of SVM). We also propose simple additional features, which greatly improve results, with cost offset by our other improvements. The result is a strong baseline for topological graph classification, with obtains SOTA results on 4 out of 9 benchmark datasets, and performs well on the rest. We even outperform GNNs in this regard, when compared on the fair evaluation framework. If you have any questions, I am happy to answer! submitted by /u/qalis [link] [comments]
    beginner project ideas [D] [P]
    i am currently studying software engineering and have done a few basic projects. i am very interested in machine learning and even completed a course on it online to know the basics. but since i am more of a hands on learner can someone suggest me some beginner projects and resources that will guide me through these projects. i want something that i can add on my resume as well. another thing i need resources for and struggle with is setting up the tools on my laptop. submitted by /u/anonymousphoenix123 [link] [comments]
    [D] LSTM test scores much better than trains scores
    I have a dataset of 20 thousand horses. For each horse, I have its 10 last historical races (finishing time, position, track name, distance etc. for 41 features) and am trying to predict its finishing time in its next race. To so so, I am using an LSTM as a feature selector for a horse's historical races, into a feed-forward network whose first layer is additionally comprised of features pertinent to the race being predicted (track name, distance, starting position etc. for 27 features.) Why is my test loss and test MAE much lower than the corresponding train metrics? ​ https://preview.redd.it/sohys0jgs6jb1.png?width=964&format=png&auto=webp&s=99eb70ab80628f6289a135a0cf1bd54795a540f4 https://preview.redd.it/3jbsfamhs6jb1.png?width=964&format=png&auto=webp&s=feb3b99867127657cd6d0d9f11de…
    Graduation Project Idea Suggestions [P]
    [P] Hi everyone, I prepare for my graduation project. I'm so interested in Gen AI and Cross-Modal learning. I'm looking for graduation project ideas that would allow me to explore these areas. Some ideas that I'm currently considering: * Automated Content Creation for Educators * Action Recognition with Language Context * Visual Question Answering If you have any suggestions, please let me know! Thanks in advance. submitted by /u/MZaher0 [link] [comments]
  • Open

    Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part II
    My journey continues as I integrate a GenAI tool (Bing AI) with my Thinking Like a Data Scientist (TLADS) methodology. In part 1 of this series, I used Bing AI to validate, augment, and enhance the first three steps in the TLADS methodology (Figure 1): And the results yielded a much deeper understanding of the… Read More »Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part II The post Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part II appeared first on Data Science Central.  ( 23 min )
  • Open

    The MLpedia Newsletter
    Every week, a selection of new Machine Learning concepts and articles added to MLpedia.ai, plus curated ML news/content from around the web (e.g. relevant papers, software, blogs). https://www.mlpedia.ai/newsletter submitted by /u/marcelocnet [link] [comments]
  • Open

    The Skynet/Terminator doomsday just got closer!
    The rise of affordable IA hardware models like NVIDIA H100 and, more recently, GH200 models are bringing doomsday closer! New advances in AI hardware are making the singularity more likely. AI systems will be able to learn and process information much faster, which could lead to a breakthrough in AI capabilities. submitted by /u/Powerful-Pumpkin-938 [link] [comments]
    A new way of creating Videos - of course with AI - great project!
    Seems Video editing is getting easier by far 🎥 With INVE, anyone can creatively edit videos in real-time. I especially liked that video in the article - it looks so VERY simple, even i could do that :) (sorry for mistakes - i am not native speaker) https://kinews24.de/inve-video-editing-becomes-childs-play ​ submitted by /u/myreddit333 [link] [comments]
    Transcribe your thoughts and get them in your voice
    Hey, So I've been building this app where you can record yourself dumping thoughts or ideas. The app then enhances your voice input and produces a short audio clip from it. Would love your feedback! :) You can try the app here - https://thoughtcast.xyz/ submitted by /u/Itaydr [link] [comments]
    AI-Created Art Isn’t Copyrightable, Judge Says in Ruling That Could Give Hollywood Studios Pause
    submitted by /u/facinabush [link] [comments]

  • Open

    [Discussion] Petition for somoeone to make a machine learning subreddit for professionals that does not include enthusiasts, philosophical discussion, chatGPT, LLM's, or generative AI past actual research papers.
    Basically to recreate the state of this sub before the advent of ChatGPT. A place for practicing professionals to share news, and ask for help/advice from verified other practitioners. Edit: And absolutely no ML products, blog posts, self promo (unless writer of published paper) / code helper tools / low code solutions etc. submitted by /u/After_Magician_8438 [link] [comments]  ( 9 min )
    Rationality in AI [R]
    Rationality assumes that the rational agent knows all and will take the action that maximizes her utility. Human beings do not satisfy this definition of rationality. submitted by /u/Character_Ad_1385 [link] [comments]  ( 9 min )
    [D] Best way to host a vector database?
    How are you guys hosting vector databases, pinecone seems really expensive submitted by /u/SayNo2Tennis [link] [comments]  ( 9 min )
    [P] Handling costs making a ChatGPT based APP - API questions
    Hi all - so my goal is to basically build an iPhone app using a ChatGPT backed character, which users can interact with by speaking (speech to text) and then will hear a spoken reply (text to speech) I'll need to use APIs that allow commercial usage. I'm trying to wrap my head around the costs of such a project. Right now I assume I'll have API costs from 1.) Speech to text (like whisper API) 2.) LLM (ChatGPT API) 3.) Text to speech (say elevenlabs API) If a ton of people start using this app, how fast am I going broke lol? I figure I can give free usage up to a point, and then users can pay for additional use if they like the service. But what do you guys recommend as the most cost effective way to do this? Looking at Elevenlabs alone, that looks like it would become super expensive very quickly. Any other APIs that allow commercial products which you would recommend? Or does this project sound like a fools errand? Any input would be greatly appreciated! submitted by /u/akuhl101 [link] [comments]  ( 9 min )
    [D] How this fancy code videos are recorded/edited?
    Recently I hace seen many videos on social media showing code or the IDE with auto-zoom and a very good style. Somebody knows how this videos are recorded/edited? All look alike and seems to be an app or similar… I address one of the post where I’ve seen this kind of videos. Thanks :) submitted by /u/VeganoDeMente [link] [comments]  ( 9 min )
    Upcoming panel discussion on challenges and approaches with LLMs [N]
    Key discussion points: - Enterprise LLM adoption and benefits - Using existing models vs. prompt engineering vs. fine-tuning - Fine-Tuning LLMs on custom datasets - Tools and platforms to facilitate LLM implementation - Addressing the challenges associated with adopting LLMs - Exploring emerging trends, advancements, etc. submitted by /u/UpstairsLeast7642 [link] [comments]  ( 9 min )
    [N] A new kind of thermal imaging sees the world in striking colors
    submitted by /u/fchung [link] [comments]  ( 9 min )
    [Project] Pipeline help in Machine Learning
    Hi, I'm using pipeline in my machine learning.I have already split th data into x_train and y_train. However, I do drop some rows in my pipeline. This means that my size or x_train is smaller then y_train.How do I overcome this and am I doing a mistake ? Thank you! I really appreciate if someone can help me ! submitted by /u/Vitoahshik [link] [comments]  ( 9 min )
    [P] https://blog.streamlit.io/exploring-llms-and-prompts-a-guide-to-the-prompttools-playground/
    submitted by /u/hegel-ai [link] [comments]  ( 9 min )
    [D] Messing with a models weights while fine-tuning
    Hello all, A college student who is interested in ML here. I was trying to use an encoder-only model(like BERT) as an embedding model and try to fine-tune it for my specific use case (for example trying to get the right product for a certain keyword using embeddings and vector DBs). Here is the question: should I update all the weights during backprop or should I just add another trainable linear layer for fine-tuning? I would also appreciate the reasoning behind your answer. Thanks! submitted by /u/gaybooii [link] [comments]  ( 9 min )
    [D] Which pre-trained model do you suggest to read PDF contents to summarise and chat?
    I am not into AI/ML. I am just a python dev with 4Y of experience. I am trying out on an idea using streamlit and want to use pre-trained models. Summarise and chat are two different functions. I tried T5, and gpt2-large. Both either don't seem to be working or my implementation is bad. submitted by /u/convicted_redditor [link] [comments]  ( 9 min )
    [N] Wise Bot Says Alpha Launch: A platform to create, share and easily use AI Chatbots with hyper-specific knowledge
    submitted by /u/wisebotsays [link] [comments]  ( 9 min )
    Landslide prediction using machine learning [Project]
    Hi everyone, currently I'm working on a project to predict landslide. The landslide I want to predict is not image, just a percentage on the possibility of the landslide. So only deal with values, The plan is : There will be a esp32 collecting the input data, soil moisture. I have gotten some comments to do the ML on the laptop therefore not sure where to do it. I have done some work on google colab, using progression type, but not sure whether it is workable. Currently I have a dataset of the average percipitation, max temp, min temp, average temp from jan to dec from 1991 and 2021 and how many landslides happened in each month. I want to able to predict whether there will be landslide happening in the month. Not sure where to start and how to put it. Any help will be appreciated. Google colab work done so far = https://colab.research.google.com/drive/1dIp3dhe9xntoBZ5PyLF-UT0YsfSjHs-Q?usp=sharing submitted by /u/EconomistBrilliant72 [link] [comments]  ( 9 min )
  • Open

    bard is better than chatgpt without AND even with code interpreter when it comes to math
    bard is better than chatgpt without AND with code interpreter when it comes to math. its undeniably clear if you try it. submitted by /u/nicdunz [link] [comments]  ( 9 min )
    Using an image to generate an AI image prompt for dummies? Someone pls dumb it down for me here
    Hey all- any help would be appreciated. I see that with a lot of models now, that I can upload a photo to use as the image prompt/base image. So um... what exactly am I doing with this now? How do I create my text prompt along with the image prompt? Do I for example, ask it to make it more realistic/cartoon/ect? Do I ask it to make the background different? Can someone give me an example for a prompt that goes along with including a base image? submitted by /u/mayonaiseshire [link] [comments]  ( 9 min )
    Handling costs building a ChatGPT app - API questions
    Hi all - so my goal is to basically build an iPhone app using a ChatGPT backed character, which users can interact with by speaking (speech to text) and then will hear a spoken reply (text to speech) I'll need to use APIs that allow commercial usage. I'm trying to wrap my head around the costs of such a project. Right now I assume I'll have API costs from 1.) Speech to text (like whisper API) 2.) LLM (ChatGPT API) 3.) Text to speech (say elevenlabs API) If a ton of people start using this app, how fast am I going broke lol? I figure I can give free usage up to a point, and then users can pay for additional use if they like the service. But what do you guys recommend as the most cost effective way to do this? Looking at Elevenlabs alone, that looks like it would become super expensive very quickly. Any other APIs that allow commercial products which you would recommend? Or does this project sound like a fools errand? Any input would be greatly appreciated! Thank you! submitted by /u/akuhl101 [link] [comments]  ( 9 min )
    Will AI ever become more than just an interactive encyclopedia?
    So first off, I've been using ChatGPT for a long time now. I remember my expectations of systems like it and so far unfortunately it hasn't yet met those expectations. I went into it thinking AI would somehow be much smarter than humans, given the amount of information they are trained on. And to some degree one can argue that due to it's vast knowledge it IS much smarter. But so far I haven't been convinced by its capabilities at all. It seems to just be trained on a big data set and it can echo points of its dataset very accurately, but when asked to invent things it just falls short so quickly. I really expected AI's to be so new and refreshing, giving me unique and modern perspectives on things. But it doesn't do that at all. The best it can do is "creative writing" which seems very limited. Why have AI's not surpassed humans in terms of imagination and novelty? I have talked to it about philosophy, history, technology, etc, but still have yet to learn anything new that I didnt already know. For example, if it has such vast knowledge about consciousness, then why is it so restricted in terms of elaborating on that topic? Can it not infer new facts from existing data? Why does it not interpolate data? Invent new things? Even when prompted? Am I asking it the wrong thing? Or am I expecting way too much here? submitted by /u/Miserable-Cobbler-16 [link] [comments]  ( 9 min )
    Rtc 4090 24gb or two v100 16gb?
    My two m40 24gbs are not supported by anything anymore. Should I get one rtx4090 24gb or two v100 16gb? I seem to be able to split some models between gpus so not sure if the 16gb limitation would be an issue. What is an issue is the v100 compute capability of 7.0, which is likely about to be unsupported. Thoughts? submitted by /u/IndustryNext7456 [link] [comments]  ( 9 min )
    Can you imagine this to our AI future
    Out future generation will be live in a doomed submitted by /u/inception247 [link] [comments]  ( 9 min )
    Is there any GOOD free and "FREE" (not limited) chat gpt 4 alternative?
    I have noticed that chat gpt has gotten worse and dumber since launch. It gives worse/more general responses, makes more mistakes and sometimes doesn't even respond. I don't support making the free version worse so that ppl would buy the premium chat gpt 4. Is there any actual chat gpt 4 alternative that has more freedom and is constantly being updated - I'm basically searching for someone that is doing what openai should be doing today but isn't. Thanks submitted by /u/Oskar5707 [link] [comments]  ( 9 min )
    Revolutionizing AI: Unleash Innovation with Dolma's 3 Trillion Tokens! All details!
    I can not believe, that they really did this: Dolma's groundbreaking 3 trillion tokens – paving the way for innovation and open-access progress. For free - for science under OpenSource License - that is unbelievable. Guys - what do you think??! That´s a milestone for data science?! https://kinews24.de/dolma-worlds-largest-free-dataset-with-3-trillion-tokens-for-llm-training-released ​ ​ submitted by /u/myreddit333 [link] [comments]  ( 9 min )
    AI chan the good listener [OC]
    submitted by /u/leonleungjeehei [link] [comments]  ( 9 min )
  • Open

    Migrating from SB3 to RLLib/ Getting started with RLLib
    Hi! I want to migrate my research from SB3 to RLLib because of the better suitability for MARL. The environment is based on Gym, so that part has been pretty doable. However, I haven't had the best time with training agents and the documentation. Does anyone know of some kind of quick start/ summary that outlines the architecture and gives some good examples for RLLib? submitted by /u/tessherelurkingnow [link] [comments]  ( 9 min )
    SB3 - AttributeError: 'DummyVecEnv' object has no attribute 'get_action_meanings'
    When I try to combine the SB3 vec_env with AtariWrapper, I get an error - import gymnasium as gym from stable_baselines3 import PPO from stable_baselines3.common.env_util import make_vec_env from stable_baselines3.common.atari_wrappers import AtariWrapper # Parallel environments # vec_env = gym.make("PongNoFrameskip-v4") vec_env = make_vec_env("PongNoFrameskip-v4", n_envs=2, seed=3) vec_env = AtariWrapper(vec_env) model = PPO("CnnPolicy", vec_env, verbose=1, n_steps=128, n_epochs=4, batch_size=256, learning_rate=2.5e-4, clip_range=0.1, vf_coef=0.5, ent_coef=0.01) model.learn(total_timesteps=1e7) model.save("ppo_cartpole") I get this error - A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7) [Powered by Stella] Traceback (most recent call last): File "D:\q_learning\sb3_ppo.py", line 10, in vec_env = AtariWrapper(vec_env) File "C:\Users\thoma\anaconda3\envs\torch_2\lib\site-packages\stable_baselines3\common\atari_wrappers.py", line 294, in __init__ env = NoopResetEnv(env, noop_max=noop_max) File "C:\Users\thoma\anaconda3\envs\torch_2\lib\site-packages\stable_baselines3\common\atari_wrappers.py", line 57, in __init__ assert env.unwrapped.get_action_meanings()[0] == "NOOP" # type: ignore[attr-defined] AttributeError: 'DummyVecEnv' object has no attribute 'get_action_meanings' Process finished with exit code 1 However, I don't get an error if I use the AtariWrapperwith a gymnasium environment - vec_env = gym.make("PongNoFrameskip-v4") # vec_env = make_vec_env("PongNoFrameskip-v4", n_envs=2, seed=3) vec_env = AtariWrapper(vec_env) model = PPO("CnnPolicy", vec_env, verbose=1, n_steps=128, n_epochs=4, batch_size=256, learning_rate=2.5e-4, clip_range=0.1, vf_coef=0.5, ent_coef=0.01) model.learn(total_timesteps=1e7) model.save("ppo_cartpole") submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    Efficient screenshots rythm game AI
    So I am trying to implement an agent that plays osu. It takes in a low resolution gray-scale image of the game and then outputs the coordinates of where it should go and also if it should click or not. I might change the actions a bit so that the movement is smooth directly from the agent. Now I’m planning on doing the training on osu directly. To get the rewards I’m planning on using something to read the memory. I’m pretty sure cheat engine can be used for that. I should also be able to speed up osu or osu lazer with the cheat engine. Now my current issue is that I don’t know how to take screenshots efficiently. Or more specifically, how to feed in data from the screen. I heard mss should be good but if you have any other ideas please tell me. Note that I will use the cheat engine only for the training part. submitted by /u/SlickVandel [link] [comments]  ( 9 min )

  • Open

    Confidence-Building Measures for Artificial Intelligence: Workshop proceedings
    No content preview  ( 2 min )

  • Open

    Frontier Model Forum
    We’re forming a new industry body to promote the safe and responsible development of frontier AI systems: advancing AI safety research, identifying best practices and standards, and facilitating information sharing among policymakers and industry.  ( 4 min )

  • Open

    Moving AI governance forward
    OpenAI and other leading labs reinforce AI safety, security and trustworthiness through voluntary commitments.  ( 5 min )

  • Open

    Custom instructions for ChatGPT
    We’re rolling out custom instructions to give you more control over how ChatGPT responds. Set your preferences, and ChatGPT will keep them in mind for all future conversations.  ( 6 min )

  • Open

    Partnership with American Journalism Project to support local news
    A new $5+ million partnership aims to explore ways the development of artificial intelligence (AI) can support a thriving, innovative local news field, and ensure local news organizations shape the future of this emerging technology.  ( 3 min )

  • Open

    Implementing Gradient Descent in PyTorch
    The gradient descent algorithm is one of the most popular techniques for training deep neural networks. It has many applications in fields such as computer vision, speech recognition, and natural language processing. While the idea of gradient descent has been around for decades, it’s only recently that it’s been applied to applications related to deep […] The post Implementing Gradient Descent in PyTorch appeared first on MachineLearningMastery.com.  ( 25 min )

  • Open

    Training a Linear Regression Model in PyTorch
    Linear regression is a simple yet powerful technique for predicting the values of variables based on other variables. It is often used for modeling relationships between two or more continuous variables, such as the relationship between income and age, or the relationship between weight and height. Likewise, linear regression can be used to predict continuous […] The post Training a Linear Regression Model in PyTorch appeared first on MachineLearningMastery.com.  ( 24 min )
    Making Linear Predictions in PyTorch
    Linear regression is a statistical technique for estimating the relationship between two variables. A simple example of linear regression is to predict the height of someone based on the square root of the person’s weight (that’s what BMI is based on). To do this, we need to find the slope and intercept of the line. […] The post Making Linear Predictions in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Loading and Providing Datasets in PyTorch
    Structuring the data pipeline in a way that it can be effortlessly linked to your deep learning model is an important aspect of any deep learning-based system. PyTorch packs everything to do just that. While in the previous tutorial, we used simple datasets, we’ll need to work with larger datasets in real world scenarios in […] The post Loading and Providing Datasets in PyTorch appeared first on MachineLearningMastery.com.  ( 20 min )

  • Open

    Using Dataset Classes in PyTorch
    In machine learning and deep learning problems, a lot of effort goes into preparing the data. Data is usually messy and needs to be preprocessed before it can be used for training a model. If the data is not prepared correctly, the model won’t be able to generalize well. Some of the common steps required […] The post Using Dataset Classes in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Calculating Derivatives in PyTorch
    Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for […] The post Calculating Derivatives in PyTorch appeared first on Machine Learning Mastery.  ( 20 min )

  • Open

    Two-Dimensional Tensors in Pytorch
    Two-dimensional tensors are analogous to two-dimensional metrics. Like a two-dimensional metric, a two-dimensional tensor also has $n$ number of rows and columns. Let’s take a gray-scale image as an example, which is a two-dimensional matrix of numeric values, commonly known as pixels. Ranging from ‘0’ to ‘255’, each number represents a pixel intensity value. Here, […] The post Two-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 21 min )

  • Open

    One-Dimensional Tensors in Pytorch
    PyTorch is an open-source deep learning framework based on Python language. It allows you to build, train, and deploy deep learning models, offering a lot of versatility and efficiency. PyTorch is primarily focused on tensor operations while a tensor can be a number, matrix, or a multi-dimensional array. In this tutorial, we will perform some […] The post One-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 22 min )

  • Open

    365 Data Science courses free until November 21
    Sponsored Post   The unlimited access initiative presents a risk-free way to break into data science.     The online educational platform 365 Data Science launches the #21DaysFREE campaign and provides 100% free unlimited access to all content for three weeks. From November 1 to 21, you can take courses from renowned instructors and earn […] The post 365 Data Science courses free until November 21 appeared first on Machine Learning Mastery.  ( 15 min )

  • Open

    Attend the Data Science Symposium 2022, November 8 in Cincinnati
    Sponsored Post      Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […] The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.  ( 10 min )

  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )

  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
2023-09-18T00:42:48.050Z osmosfeed 1.15.1